skip to Main Content

I have a weird CSV with one column being JSON data of quite a large size and one is a name. Showing proper JSON formatting for clarity, but file is flat with no new line chars.

name, 
{
"field1":
   {"name":"foo",
   "email":"[email protected]"},
"field2":{
     "subfield":{
          "subfield2":{
               "active":1,
               "passive":11,
               "running":111
                       }
                  }
           }
}

I am trying to get the name and all unique values for the subfield 2 into a output (ideally CSV).

name, 
active passive running

The issue is that the number of fields is not constant and the names of the subfields are not constant either. I have tried using jq, Miller, and sed/awk without much luck since it combines a huge JSON blob and CSV data.

2

Answers


  1. If you run in your sample input

    <input.txt sed '/name,/d' | jq -cr '.field2.subfield.subfield2 | keys[]' | paste -s -d ' ' | mlr --csv --implicit-csv-header then label name
    

    you get

    name
    active passive running
    

    The sample output you write, is wrong, because if it’s a one field CSV, you do not have the , after name, there are no other fields

    name, 
    active passive running
    

    But I probably didn’t understand what you want

    Login or Signup to reply.
  2. You could use your favorite CSV-to-TSV translator to convert the CSV to TSV and then run it through jq like so:

    jq -rR '
      split("t") 
      | [[.0], 
         ( .[1]|fromjson|.field2.subfield.subfield2 | keys_unsorted|join(" "))] 
      | @csv'
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search