Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – How to group array entries by key and by key value in jq?

Ant243nio
September 19, 2023
279 views
4 votes
4 Answers

Given this json:

{
  "hits": [
    {
      "country": "PT",
      "level": "H2",
      "id": "id1"
    },
    {
      "country": "PT",
      "level": "H1",
      "id": "id2"
    },
    {
      "country": "CZ",
      "level": "H2",
      "id": "id3"
    },
    {
      "country": "IT",
      "level": "H2",
      "id": "id4"
    },
    {
      "country": "PT",
      "level": "H3",
      "id": "id5"
    },
    {
      "country": "PT",
      "level": "H3",
      "id": "id6"
    },
    {
      "country": "PT",
      "level": "H4”,
      "id": "id7"
    }
  ]
}

I would like to group this by country and in two levels, one having the H1 and H2 entries, and the other having the H3 and H4 entries. Furthermore, I would like that ids and levels to be encapsulated in an object called "entities" like follows:

{
  "hits": [
    {
      "country": "PT",
      "entities": [
        {
          "id": "id2",
          "level": "H1"
        },
        {
          "id": "id1",
          "level": "H2"
        }
      ]
    },
    {
      "country": "CZ",
      "entities": [
        {
          "id": "id3",
          "level": "H2"
        }
      ]
    },
    {
      "country": "IT",
      "entities": [
        {
          "id": "id4",
          "level": "H2"
        }
      ]
    },
    {
      "country": "PT",
      "entities": [
        {
          "id": "id5",
          "level": "H3"
        },
        {
          "id": "id6",
          "level": "H3"
        },
        {
          "id": "id7",
          "level": "H4"
        },
      ]
    }
  ]
}

I am quite new to jq. Could anyone help me doing this in jq?

Tags: group-by jq json

Answers

Provide group_by with two criteria: one is simply the value of .country, the other is the containedness of .level in a given set of values (here, using IN with H1 and H2, which evaluates to a boolean). After the grouping, use a map to rectify the shapes of the individual groups as desired.

.hits |= (
  group_by(.country, IN(.level; "H1", "H2")) 
  | map((first | {country}) + {entities: map({id, level})})
)

{
  "hits": [
    {
      "country": "CZ",
      "entities": [
        {
          "id": "id3",
          "level": "H2"
        }
      ]
    },
    {
      "country": "IT",
      "entities": [
        {
          "id": "id4",
          "level": "H2"
        }
      ]
    },
    {
      "country": "PT",
      "entities": [
        {
          "id": "id5",
          "level": "H3"
        },
        {
          "id": "id6",
          "level": "H3"
        },
        {
          "id": "id7",
          "level": "H4"
        }
      ]
    },
    {
      "country": "PT",
      "entities": [
        {
          "id": "id1",
          "level": "H2"
        },
        {
          "id": "id2",
          "level": "H1"
        }
      ]
    }
  ]
}

Demo

- knittl
- September 14, 2023 at 10:34 pm
- 0 votes
0
Here is one possible solution which uses of the , filter which generates multiple outputs for a single input. It also uses the fact that iterating inside an object "multiplies" this object (generating multiple objects, one for each iterated value)
```
.hits
| group_by(.country)
| map({
    country: first.country,
    entities: map({id,level})
    | (map(select(IN("H1","H2";.level))), map(select(IN("H3","H4";.level))))
    | select(length>0)
})
| { hits: . }
```
Login or Signup to reply.

- peak
- September 15, 2023 at 3:20 am
- 0 votes
0
Here’s a straightforward solution using the generic stream-oriented function aggregate_by/2:
```
def aggregate_by(s; f):
  reduce s as $x  (null; .[$x|f] += [$x]);

aggregate_by(.hits[]; .level)
| [aggregate_by( (.H1 + .H2)[]; .country)] +
  [aggregate_by( (.H3 + .H4)[]; .country)]
| map( to_entries[]
       | {country: .key, entities: (.value | map(del(.country))) } )
| {hits: [ . ]}
```
Using aggregate_by here not only makes the solution quite straightforward
once one understands what it does, but also makes the program relatively efficient in comparison to group_by/1 in that the latter relies on sorting.
Login or Signup to reply.

- sudocracy
- September 15, 2023 at 9:39 am
- 0 votes
0
In case it is helpful to someone: if we start with something very similar to the selected answer, but with minor changes for readability (demo):
```
.hits |= [

  group_by(.country, (.level == "H1" or .level == "H2")) []
  | { country : first.country, entities: [.[] | { level, id }] }

]
```
we can accommodate an arbitrary complex grouping criteria as follows (demo):
```
def transform:
  if    . == "H1" then 1
  elif  . == "H2" then 1
  elif  . == "H3" then 2
  elif  . == "H4" then 2
  else  3
  end;

.hits |= [
 group_by(.country, (.level | transform)) []
 | { country : first.country, entities: [.[] | del(.country) ] }
]
```
It also keeps all properties except .country instead of having to provide a whitelist of values.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.