skip to Main Content

I have an huge json file (around 7GB) with multiple elements, let’s call it data.json. Each element looks like this:

{'aggregation_bits': '0xf7fffffffffffdfffffffffffffdfffffffffffffffffffffff7fffffffffffffff7ff',
  'data': {'slot': '6436981',
   'index': '8',
   'beacon_block_root': '0xf8afce2fc26df10061d641633a7256cf47ffa8793771ae83e190e72bf2c5886e',
   'source': {'epoch': '201154',
    'root': '0x694231a8135b3f546a6fe483ba6b467c686592f6d627aaf06b56ffbd78d75f63'},
   'target': {'epoch': '201155',
    'root': '0x2fc0d9f58cda026678d98f010fec13d202eec4ec9ae4dbd57878413a75fe22d7'}},
  'signature': '0xadb0d576ee418e37017f423e77622ce85bdb741d0e984e78df3a250c172fc67c6c7ec94e4802d6323aee8f62b7df218300f42010fe673ed85f4e06d1361336474ecfb4db39aa2bc31b31bf6fbe52c7bff769f024faa5ba3554d5ea02fe9663c3',
  'arrival': 1684067808365}

What I want to do is use jq scripting to select all elements that contain slot with a certain value and save them in a separate json file, let’s call it data_for_slot.json

Let’s assume I want all elements where slot = 6436990. For this purpose, I used:

jq -c '[select(.data.slot == "6436990")]' data.json > data_for_slot.json

After running for some time, it returned data_for_slot.json, but when I opened it to inspect, what I got was:

[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
...

I know for a fact that slot = 6436990 exists in some of the elements, so I was expecting an array of such elements, but got this instead. I don’t understand why is it returning results like this, while not throwing any error.

I am very new to jq scripting, and I would appreciate if anyone could tell me what am I doing wrong and what is the correct way of doing this.

2

Answers


  1. It seems that the issue you’re facing with the jq command is related to the size of your JSON file. The command you’re using is correct, but it might be causing memory constraints or performance issues when processing such a large file.

    To handle large JSON files more efficiently, you can try using a streaming JSON processor like jql, which is a lightweight alternative to jq. It operates on JSON streams without loading the entire file into memory.

    Here’s how you can accomplish your task using jql:

    1. Install jql by following the instructions provided in their repository: https://github.com/cube2222/jql

    2. Once installed, you can use the following command to extract the elements with the desired slot value:

    jql -c '.[] | select(.data.slot == "6436990")' data.json > data_for_slot.json
    

    This command will stream the JSON file and select only the elements that have slot equal to "6436990". The results will be saved in the data_for_slot.json file.

    Using jql should help you avoid memory issues when working with large JSON files.

    Login or Signup to reply.
  2. The jq filter [select(.data.slot == "6436990")] will create an array […] for each input item. As you are filtering out (most of) the input items using select(…), you get a stream of (mostly) empty arrays.

    Drop the array brackets (just use select(…)) and there will be no output for input items not matching.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search