I have an huge json file (around 7GB) with multiple elements, let’s call it data.json. Each element looks like this:
{'aggregation_bits': '0xf7fffffffffffdfffffffffffffdfffffffffffffffffffffff7fffffffffffffff7ff',
'data': {'slot': '6436981',
'index': '8',
'beacon_block_root': '0xf8afce2fc26df10061d641633a7256cf47ffa8793771ae83e190e72bf2c5886e',
'source': {'epoch': '201154',
'root': '0x694231a8135b3f546a6fe483ba6b467c686592f6d627aaf06b56ffbd78d75f63'},
'target': {'epoch': '201155',
'root': '0x2fc0d9f58cda026678d98f010fec13d202eec4ec9ae4dbd57878413a75fe22d7'}},
'signature': '0xadb0d576ee418e37017f423e77622ce85bdb741d0e984e78df3a250c172fc67c6c7ec94e4802d6323aee8f62b7df218300f42010fe673ed85f4e06d1361336474ecfb4db39aa2bc31b31bf6fbe52c7bff769f024faa5ba3554d5ea02fe9663c3',
'arrival': 1684067808365}
What I want to do is use jq scripting to select all elements that contain slot with a certain value and save them in a separate json file, let’s call it data_for_slot.json
Let’s assume I want all elements where slot = 6436990. For this purpose, I used:
jq -c '[select(.data.slot == "6436990")]' data.json > data_for_slot.json
After running for some time, it returned data_for_slot.json, but when I opened it to inspect, what I got was:
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
...
I know for a fact that slot = 6436990 exists in some of the elements, so I was expecting an array of such elements, but got this instead. I don’t understand why is it returning results like this, while not throwing any error.
I am very new to jq scripting, and I would appreciate if anyone could tell me what am I doing wrong and what is the correct way of doing this.
2
Answers
It seems that the issue you’re facing with the jq command is related to the size of your JSON file. The command you’re using is correct, but it might be causing memory constraints or performance issues when processing such a large file.
To handle large JSON files more efficiently, you can try using a streaming JSON processor like
jql
, which is a lightweight alternative tojq
. It operates on JSON streams without loading the entire file into memory.Here’s how you can accomplish your task using
jql
:Install
jql
by following the instructions provided in their repository: https://github.com/cube2222/jqlOnce installed, you can use the following command to extract the elements with the desired slot value:
This command will stream the JSON file and select only the elements that have
slot
equal to "6436990". The results will be saved in thedata_for_slot.json
file.Using
jql
should help you avoid memory issues when working with large JSON files.The jq filter
[select(.data.slot == "6436990")]
will create an array[…]
for each input item. As you are filtering out (most of) the input items usingselect(…)
, you get a stream of (mostly) empty arrays.Drop the array brackets (just use
select(…)
) and there will be no output for input items not matching.