Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Using jq to extract multiple json objects

fsumathguy
July 31, 2023
133 views
1 vote
2 Answers

I have been using jq to successfully extract one JSON blob at a time from some relatively large files and write it out to a file of one JSON object per line for further processing. Here is an example of the JSON format:

{
  "date": "2023-07-30",
  "results1":[
    {
      "data": [    
        {"row": [{"key1": "row1", "key2": "row1"}]},
        {"row": [{"key1": "row2", "key2": "row2"}]}
      ]
    },
    {
      "data": [    
        {"row": [{"key1": "row3", "key2": "row3"}]},
        {"row": [{"key1": "row4", "key2": "row4"}]}
      ]
    }
  ],
  "results2":[
    {
      "data": [    
        {"row": [{"key3": "row1", "key4": "row1"}]},
        {"row": [{"key3": "row2", "key4": "row2"}]}
      ]
    },
    {
      "data": [    
        {"row": [{"key3": "row3", "key4": "row3"}]},
        {"row": [{"key3": "row4", "key4": "row4"}]}
      ]
    }
  ]
}

My current approach is to run the following and redirect the stdout to a file:

jq -rc ".results1[]" my_json.json

This works fine, however, it seems like jq reads the entire file into memory in order to extract the chunk I am interested in.

Questions:

Does jq read the entire file into memory when I execute the above
statement?
Assuming the answer is yes, is there a way that I can extract results1[] and results2[] on the same call to avoid reading the file twice?

I have used the --stream option but it is very slow. I also read that it sacrifices speed for memory savings, but memory is not an issue at this time so I would prefer to avoid using this option. Basically, what I need is to read in the above json once and output two files in JSON lines format.

Edit: (I changed the input data a bit to show the differences in the output)

Output file 1:

{"data":[{"row":[{"key1":"row1","key2":"row1"}]},{"row":[{"key1":"row2","key2":"row2"}]}]}
{"data":[{"row":[{"key1":"row3","key2":"row3"}]},{"row":[{"key1":"row4","key2":"row4"}]}]}

Output file 2:

{"data":[{"row":[{"key3":"row1","key4":"row1"}]},{"row":[{"key3":"row2","key4":"row2"}]}]}
{"data":[{"row":[{"key3":"row3","key4":"row3"}]},{"row":[{"key3":"row4","key4":"row4"}]}]}

It seems pretty well known that the streaming option is slow. See the discussion here.

My attempt at implementing it followed the answer here.

Tags: jq json

Answers

jq doesn’t have any file IO facilities, so you can’t output multiple files.

You can output each piece of data with it’s key and post-process it:

jq -r '
    to_entries[]
    | select(.key != "date")
    | .key as $k
    | .value[]
    | [$k, @json]
    | @tsv
' my_json.json

outputs

results1    {"data":[{"row":[{"key1":"row1","key2":"row1"}]},{"row":[{"key1":"row2","key2":"row2"}]}]}
results1    {"data":[{"row":[{"key1":"row3","key2":"row3"}]},{"row":[{"key1":"row4","key2":"row4"}]}]}
results2    {"data":[{"row":[{"key3":"row1","key4":"row1"}]},{"row":[{"key3":"row2","key4":"row2"}]}]}
results2    {"data":[{"row":[{"key3":"row3","key4":"row3"}]},{"row":[{"key3":"row4","key4":"row4"}]}]}

So:

while IFS=$'t' read -r key json; do
    printf '%sn' "$json" >> "${key}.jsonl"
done < <(
    jq -r '...' my_json.json
)

jq -r '...' my_json.json | awk -F 't' '{print $2 > ($1 ".jsonl")}'

- pmf
- July 31, 2023 at 3:25 pm
- 0 votes
0
With Bash ≥ 4, processing bigger chunks could be improved by reading n lines at once using mapfile:
```
jq -cr '$ARGS.positional[] as $key | .[$key] | $key, length, .[]' input.json 
  --args results1 results2 | while read -r key; read -r len
do mapfile -t -n $len
  printf '%sn' "${MAPFILE[@]}" > "$key.jsonl"
done
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.