Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

For a simple key-value pair list JSON, use jq to print a summary by range of values

JojoThomas
July 30, 2023
194 views
0 votes
2 Answers

Consider the following JSON having a list of key-value pairs

{
  "session1": 128,
  "session2": 1048596,
  "session3": 3145728,
  "session4": 3145828,
  "session5": 11534338,
  "session6": 11544336,
  "session7": 2097252
}

The key is a session identifier, and the value is the length of the value stored in the session.

I want to print counts of values by range – the ranges being (lower bound included, high bound excluded); 0-1MB, 1-2MB, 2-3MB, … 12-13MB.

 1MB =  1048576
 2MB =  2097152
 3MB =  3145728
 4MB =  4194304
 5MB =  5242880
 6MB =  6291456
 7MB =  7340032
 8MB =  8388608
 9MB =  9437184
10MB = 10485760
11MB = 11534336
12MB = 12582912
13MB = 13631488

The expected output is

{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "10-11MB": 2
}

The above is just representative, suggestions are welcome.

Tags: frequency-distribution jq json

Answers

- fizzie
- July 30, 2023 at 1:14 pm
- 0 votes
0
The following should work:
```
to_entries
| map(.value / 1048576 | floor | [tostring, "-", (.+1 | tostring), "MB"] | add)
| group_by(.)
| map({"key": .[0], "value": length})
| from_entries
```
For your input, it produces the following output:
```
{
  "0-1MB": 1,
  "1-2MB": 1,
  "11-12MB": 2,
  "2-3MB": 1,
  "3-4MB": 2
}
```
(11534338 and 11544336 are counted in the "11-12MB" bucket rather than the "10-11MB" one, because 11*2^20 = 11534336, and those numbers are larger than that.)

If you wanted the keys in numeric order, you could also convert them to your preferred string labels after the group_by:
```
to_entries
| map(.value / 1048576 | floor)
| group_by(.)
| map({"key": [(.[0] | tostring), "-", (.[0]+1 | tostring), "MB"] | add, "value": length})
| from_entries
```
Which produces:
```
{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "11-12MB": 2
}
```
Both solutions have the same basic steps:
1. Convert the input object to an array of {"key": x, "value": y} entries (to_entries).
2. Map the entries into something that identifies the range they’re in, by rouding down to the nearest megabyte (.value / 1048576 | floor).
3. Group by the value (group_by). This produces an array like [[0], [1], [2], [3, 3], [11, 11]] for your input.
4. For each group, produce an entry where the "key" field is the range label ("X-YMB") and the "value" is the number of elements in the group (length).
5. Convert the list of entries back to a single object (from_entries).
Login or Signup to reply.

- pmf
- July 30, 2023 at 2:57 pm
- 0 votes
0
Here’s an approach using reduce which simply iterates over the input values integer-divided by 1MB, and successively increments the according result field by one.
```
reduce (.[] / 1048576 | floor) as $k ({}; ."($k)-($k+1)MB" += 1)
```
```
{
  "0-1MB": 1,
  "1-2MB": 1,
  "3-4MB": 2,
  "11-12MB": 2,
  "2-3MB": 1
}
```
Demo

The stream of numbers iterated over can, of course, be sorted first to get an object with increasing field names:
```
reduce (map(.) | sort[] / 1048576 | floor) as $k ({}; ."($k)-($k+1)MB" += 1)
```
```
{
  "0-1MB": 1,
  "1-2MB": 1,
  "2-3MB": 1,
  "3-4MB": 2,
  "11-12MB": 2
}
```
Demo
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.