skip to Main Content

Given this json:

{
  "hits": [
    {
      "country": "PT",
      "level": "H2",
      "id": "id1"
    },
    {
      "country": "PT",
      "level": "H1",
      "id": "id2"
    },
    {
      "country": "CZ",
      "level": "H2",
      "id": "id3"
    },
    {
      "country": "IT",
      "level": "H2",
      "id": "id4"
    },
    {
      "country": "PT",
      "level": "H3",
      "id": "id5"
    },
    {
      "country": "PT",
      "level": "H3",
      "id": "id6"
    },
    {
      "country": "PT",
      "level": "H4”,
      "id": "id7"
    }
  ]
}

I would like to group this by country and in two levels, one having the H1 and H2 entries, and the other having the H3 and H4 entries. Furthermore, I would like that ids and levels to be encapsulated in an object called "entities" like follows:

{
  "hits": [
    {
      "country": "PT",
      "entities": [
        {
          "id": "id2",
          "level": "H1"
        },
        {
          "id": "id1",
          "level": "H2"
        }
      ]
    },
    {
      "country": "CZ",
      "entities": [
        {
          "id": "id3",
          "level": "H2"
        }
      ]
    },
    {
      "country": "IT",
      "entities": [
        {
          "id": "id4",
          "level": "H2"
        }
      ]
    },
    {
      "country": "PT",
      "entities": [
        {
          "id": "id5",
          "level": "H3"
        },
        {
          "id": "id6",
          "level": "H3"
        },
        {
          "id": "id7",
          "level": "H4"
        },
      ]
    }
  ]
}

I am quite new to jq. Could anyone help me doing this in jq?

4

Answers


  1. Provide group_by with two criteria: one is simply the value of .country, the other is the containedness of .level in a given set of values (here, using IN with H1 and H2, which evaluates to a boolean). After the grouping, use a map to rectify the shapes of the individual groups as desired.

    .hits |= (
      group_by(.country, IN(.level; "H1", "H2")) 
      | map((first | {country}) + {entities: map({id, level})})
    )
    
    {
      "hits": [
        {
          "country": "CZ",
          "entities": [
            {
              "id": "id3",
              "level": "H2"
            }
          ]
        },
        {
          "country": "IT",
          "entities": [
            {
              "id": "id4",
              "level": "H2"
            }
          ]
        },
        {
          "country": "PT",
          "entities": [
            {
              "id": "id5",
              "level": "H3"
            },
            {
              "id": "id6",
              "level": "H3"
            },
            {
              "id": "id7",
              "level": "H4"
            }
          ]
        },
        {
          "country": "PT",
          "entities": [
            {
              "id": "id1",
              "level": "H2"
            },
            {
              "id": "id2",
              "level": "H1"
            }
          ]
        }
      ]
    }
    

    Demo

    Login or Signup to reply.
  2. Here is one possible solution which uses of the , filter which generates multiple outputs for a single input. It also uses the fact that iterating inside an object "multiplies" this object (generating multiple objects, one for each iterated value)

    .hits
    | group_by(.country)
    | map({
        country: first.country,
        entities: map({id,level})
        | (map(select(IN("H1","H2";.level))), map(select(IN("H3","H4";.level))))
        | select(length>0)
    })
    | { hits: . }
    
    Login or Signup to reply.
  3. Here’s a straightforward solution using the generic stream-oriented function aggregate_by/2:

    def aggregate_by(s; f):
      reduce s as $x  (null; .[$x|f] += [$x]);
    
    aggregate_by(.hits[]; .level)
    | [aggregate_by( (.H1 + .H2)[]; .country)] +
      [aggregate_by( (.H3 + .H4)[]; .country)]
    | map( to_entries[]
           | {country: .key, entities: (.value | map(del(.country))) } )
    | {hits: [ . ]}
    
    

    Using aggregate_by here not only makes the solution quite straightforward
    once one understands what it does, but also makes the program relatively efficient in comparison to group_by/1 in that the latter relies on sorting.

    Login or Signup to reply.
  4. In case it is helpful to someone: if we start with something very similar to the selected answer, but with minor changes for readability (demo):

    .hits |= [
    
      group_by(.country, (.level == "H1" or .level == "H2")) []
      | { country : first.country, entities: [.[] | { level, id }] }
    
    ]
    

    we can accommodate an arbitrary complex grouping criteria as follows (demo):

    def transform:
      if    . == "H1" then 1
      elif  . == "H2" then 1
      elif  . == "H3" then 2
      elif  . == "H4" then 2
      else  3
      end;
    
    .hits |= [
     group_by(.country, (.level | transform)) []
     | { country : first.country, entities: [.[] | del(.country) ] }
    ]
    

    It also keeps all properties except .country instead of having to provide a whitelist of values.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search