skip to Main Content

I’ve 2 files: changes.csv and sample.json. My csv file is like following:

header "a", "b"
      "a11","b1"
      "a22","b2"
      "a33","b3"

and the json file is like:

[
 {"a":"a1","b":"b1"},
 {"a":"a2","b":"b2"},
 {"a":"a3","b":"b3"},
 {"a":"a4","b":"b4"}
]

I need to write a jq command, which make changes in json file using the csv file, i.e., the final output of the json file should be like following:

[
 {"a":"a11","b":"b1"},
 {"a":"a22","b":"b2"},
 {"a":"a33","b":"b3"},
 {"a":"a4","b":"b4"}
]

I wrote the following command:

while IFS=",", read f1 f2
do
  jq --argjson k1 $f1 --argjson k2 $f2 '(.[] | select(.b == $k2) | .a) |= $k1' sample.json| sponge sample.json
done < changes.csv

Although, for each iteration it is able to filter and update the value of key "a", but when I try to sponge the results into the json file, it is unable to do so. Don’t know where exactly I am missing out.

3

Answers


  1. Chosen as BEST ANSWER

    So I just figured it out that if I transform my csv file values into input readable format and use --arg instead of --argjson, the file gets update fine. So, the command awk '{gsub(/"/,"")};1' b.csv > c.csv will convert the csv file into:

    a11,b1
    a22,b2
    a33,b3
    

    So, now if apply the command:

    while IFS=",", read f1 f2
    do
      jq --arg k1 $f1 --arg k2 $f2 '(.[] | select(.b == $k2) | .a) |= $k1' sample.json| sponge sample.json
    done < c.csv
    

    the desired changes will take place.

    The plausible reason why it failed in older format because when we read values using "read" command, it transforms "a" to ""a"". So, each time I append the answer, it returns void answer.

    The answer works with any type of json format. For example, if json object is like:

    [
     { 
      "d": { "a":"a1", "c":"c1"},
      "b": "b1"
      }
    ]
    

    we can update the command to '(.[] | select(.b == $k2) | .d.a) |= $k1'.

    I didn't try the other methods posted, but I appreciate for the help. Thanks folks (@peak & @Fravadona)


  2. Assuming the CSV is reasonably well-behaved, you could write:

    # Skip the CSV header row by NOT specifying the -n option
    < changes.csv | jq -Rcr --argfile json sample.json '
      def trim: sub("^[ t]*""; "") | sub(""[ t]*$";"");
      INDEX(inputs | split(",") | map(trim) | select(length>0); .[1]) as $dict
      | $json
      | map( .a = $dict[.b][0] )
    '
    

    For more messy CSV, you will probably want to use a CSV-to-JSON or CSV-to-TSV tool (which can both quite easily be written in jq — see e.g. https://rosettacode.org/wiki/Convert_CSV_records_to_TSV#jq)


    If you prefer not to use the –argfile option, then by all means use some other method of reading the two files, e.g. you could use –rawfile for the CSV, leaving STDIN for the JSON.

    Login or Signup to reply.
  3. For a pure jq solution, you’ll better make sure that your CSV doesn’t contain any , or " or n in any field or else the code will become a lot more complex.

    Hence, I propose a solution with Miller (available here for several OSs), which can do the task easily and robustly:

    mlr --icsv --ojson --no-jvstack join --ijson -f file.json -j 'b' --ul file.csv
    
    [
    {"b": "b1", "a": "a11"},
    {"b": "b2", "a": "a22"},
    {"b": "b3", "a": "a33"}
    ]
    

    Let’s decompose the command:

    • mlr join -f file1 -j 'b' file2
      

      will join file1 and file2 on the field b. When an other field than b exists in both files (for ex. a) then it is the value of file2 that is outputted.

      So, for updating the JSON with the values in the CSV, file1 shall be the JSON and file2 the CSV.

    • --ul means to output the unjoinable lines of file1; without it, the output will only contain the records that have been "paired".

    • With the join verb, Miller allows to specify a different input format for file1 than the one of file2; you set the default input format as --icsv (for handling file2) and override it with --ijson after the join verb (for handling file1).

    • The output format is set to JSON with --ojson. The --no-jvstack means to output one "record" per line.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search