Ubuntu - How to use jq in linux for multiple files?

Juhele
August 13, 2024
188 views
1 vote
2 Answers

I am trying to convert SAFECAST API JSON to a CSV suitable for spreadsheet programs like Excel etc. to prepare some data summaries etc. I have semi-manual WGET script to download several JSON files based on my filters.

I am able to convert single JSON to usable CSV using jq (Ubuntu linux):

cat bgeigie_imports_001.json |  jq -r '["id", "user_id", "source url", "status", "submitted", "measurements_count", "approved", "created_at", "updated_at", "name", "description", "lines_count", "credits", "height", "orientation", "cities", "subtype", "rejected"], (.[] | [.id, .user_id, .source.url, .status, .submitted, .measurements_count, .approved, .created_at, .updated_at, .name, .description, .lines_count, .credits, .height, .orientation, .cities, .subtype, .rejected]) | @csv ' > bgeigie_imports_001.csv

but I am unable to make it work automatically and convert all json files in the folder (this thread did not help).

any ideas? Loading the file names for input/output from a txt file might also help as I have a list from the wget script…

appreciate any advice, unfortunately my skills are limited

thanks

PS: merging the json first using jq -s '.' *.json > All_data.json produces somehow different file and the jq command above fails to work properly

Tags: batch-processing csv jq json linux

Answers

From your post it’s quite unclear how many JSON files you have, how they correlate to the API link and what they contain, so I can’t really produce a tested working example; you also mentioned the final goal of a single CSV? I’d expect e.g. the following to work for you:

find . -maxdepth 1 -type f -name *.json -print0 | xargs -0 -I"^" jq -r '["id", "user_id", "source url", "status", "submitted", "measurements_count", "approved", "created_at", "updated_at", "name", "description", "lines_count", "credits", "height", "orientation", "cities", "subtype", "rejected"], (.[] | [.id, .user_id, .source.url, .status, .submitted, .measurements_count, .approved, .created_at, .updated_at, .name, .description, .lines_count, .credits, .height, .orientation, .cities, .subtype, .rejected]) | @csv ^' >> data.csv

Bellow is a bash script you can debug line by line.

Validate once the single output file is generated.

You might have duplicates.
The lines are not sorted by time (field created_at).

More complicated script can remove duplicates and sort by time.

Sample bash script

#!/bin/bash

# Output file name
output_file="output.csv"

# Create the header row
echo '"id","user_id","source url","status","submitted","measurements_count","approved","created_at","updated_at","name","description","lines_count","credits","height","orientation","cities","subtype","rejected"' > "$output_file"

# Process each JSON file
for json_file in *.json; do
    if [ -f "$json_file" ]; then
        jq -r '["id", "user_id", "source url", "status", "submitted", "measurements_count", "approved", "created_at", "updated_at", "name", "description", "lines_count", "credits", "height", "orientation", "cities", "subtype", "rejected"], (.[] | [.id, .user_id, .source.url, .status, .submitted, .measurements_count, .approved, .created_at, .updated_at, .name, .description, .lines_count, .credits, .height, .orientation, .cities, .subtype, .rejected]) | @csv' "$json_file" >> "$output_file"
    fi
done

echo "All JSON files processed and combined into $output_file"

Please signup or login to give your own answer.

Click here to cancel reply.

Ubuntu – How to use jq in linux for multiple files?

Answers

Validate once the single output file is generated.

Sample bash script