skip to Main Content

I am trying to convert SAFECAST API JSON to a CSV suitable for spreadsheet programs like Excel etc. to prepare some data summaries etc. I have semi-manual WGET script to download several JSON files based on my filters.

I am able to convert single JSON to usable CSV using jq (Ubuntu linux):

cat bgeigie_imports_001.json |  jq -r '["id", "user_id", "source url", "status", "submitted", "measurements_count", "approved", "created_at", "updated_at", "name", "description", "lines_count", "credits", "height", "orientation", "cities", "subtype", "rejected"], (.[] | [.id, .user_id, .source.url, .status, .submitted, .measurements_count, .approved, .created_at, .updated_at, .name, .description, .lines_count, .credits, .height, .orientation, .cities, .subtype, .rejected]) | @csv ' > bgeigie_imports_001.csv

but I am unable to make it work automatically and convert all json files in the folder (this thread did not help).

any ideas? Loading the file names for input/output from a txt file might also help as I have a list from the wget script…

appreciate any advice, unfortunately my skills are limited

thanks

PS: merging the json first using jq -s '.' *.json > All_data.json produces somehow different file and the jq command above fails to work properly

2

Answers


  1. From your post it’s quite unclear how many JSON files you have, how they correlate to the API link and what they contain, so I can’t really produce a tested working example; you also mentioned the final goal of a single CSV? I’d expect e.g. the following to work for you:

    find . -maxdepth 1 -type f -name *.json -print0 | xargs -0 -I"^" jq -r '["id", "user_id", "source url", "status", "submitted", "measurements_count", "approved", "created_at", "updated_at", "name", "description", "lines_count", "credits", "height", "orientation", "cities", "subtype", "rejected"], (.[] | [.id, .user_id, .source.url, .status, .submitted, .measurements_count, .approved, .created_at, .updated_at, .name, .description, .lines_count, .credits, .height, .orientation, .cities, .subtype, .rejected]) | @csv ^' >> data.csv
    
    Login or Signup to reply.
  2. Bellow is a bash script you can debug line by line.

    Validate once the single output file is generated.

    1. You might have duplicates.
    2. The lines are not sorted by time (field created_at).

    More complicated script can remove duplicates and sort by time.

    Sample bash script

    #!/bin/bash
    
    # Output file name
    output_file="output.csv"
    
    # Create the header row
    echo '"id","user_id","source url","status","submitted","measurements_count","approved","created_at","updated_at","name","description","lines_count","credits","height","orientation","cities","subtype","rejected"' > "$output_file"
    
    # Process each JSON file
    for json_file in *.json; do
        if [ -f "$json_file" ]; then
            jq -r '["id", "user_id", "source url", "status", "submitted", "measurements_count", "approved", "created_at", "updated_at", "name", "description", "lines_count", "credits", "height", "orientation", "cities", "subtype", "rejected"], (.[] | [.id, .user_id, .source.url, .status, .submitted, .measurements_count, .approved, .created_at, .updated_at, .name, .description, .lines_count, .credits, .height, .orientation, .cities, .subtype, .rejected]) | @csv' "$json_file" >> "$output_file"
        fi
    done
    
    echo "All JSON files processed and combined into $output_file"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search