I have a bit of a thorny JSON manipulation problem. I have half a mind to just write a Python program to do it, but I’m wondering if a well-written jq
query can solve it more elegantly — partly for a cleaner solution, and partly for pedagogic purposes. (I’m a jq
noob and would love to take this opportunity to learn.)
I have the following JSON, printed from a tool whose output format I cannot modify:
[
{
"ExifTool:ExifTool:ExifTool": {
"ExifToolVersion": 12.76
},
"SourceFile": "./_DSC5848.JPG",
"File:System:Other": {
"FileName": "_DSC5848.JPG",
"Directory": ".",
"FileSize": "82 kB",
"FilePermissions": "-rw-r--r--"
},
"EXIF:ExifIFD:Camera": {
"ExposureProgram": "Aperture-priority AE",
"MaxApertureValue": 1.4,
"Sharpness": "Normal"
},
"File:System:Time": {
"FileModifyDate": "2024:09:24 14:10:16-07:00",
"FileAccessDate": "2024:09:28 00:13:26-07:00",
"FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
},
"EXIF:ExifIFD:Image": {
"ExposureTime": "1/50",
"FNumber": 4.0,
"ISO": 200
},
... additional arbitrary colon-keys ...
},
{ ... },
{ ... },
{ ... },
{ ... }
]
I need the keys containing colons (I’ll call them “colon-keys”) to be recursively “unrolled” such that "A:B:C": { ... }
becomes:
"A": {
"B": {
"C": { ... }
}
}
Colon-keys with identical prefixes would be merged. For example, if there is also a colon-key "A:B:D": { ... }
, the above would become:
"A": {
"B": {
"C": { ... },
"D": { ... }
}
}
Preserving the order of keys isn’t crucial, though it’d be cool if possible. It’s not known in advance what the names of the colon-keys will be, so hard-coding them unfortunately isn’t an option.
So to circle back to the example from the beginning of this post, the output would look like:
[
{
"ExifTool": {
"ExifTool": {
"ExifTool": {
"ExifToolVersion": 12.76
}
}
},
"SourceFile": "./_DSC5848.JPG",
"File": {
"System": {
"Other": {
"FileName": "_DSC5848.JPG",
"Directory": ".",
"FileSize": "82 kB",
"FilePermissions": "-rw-r--r--"
},
"Time": {
"FileModifyDate": "2024:09:24 14:10:16-07:00",
"FileAccessDate": "2024:09:28 00:13:26-07:00",
"FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
}
}
},
"EXIF": {
"ExifIFD": {
"Camera": {
"ExposureProgram": "Aperture-priority AE",
"MaxApertureValue": 1.4,
"Sharpness": "Normal"
},
"Image": {
"ExposureTime": "1/50",
"FNumber": 4.0,
"ISO": 200
}
}
}
},
{ ... },
{ ... },
{ ... },
{ ... }
]
Is this possible to do with a well-written jq
query, or is my only option a hand-rolled program?
Bonus, would such a query be able to handle colon-keys of arbitrary length (A:B
, A:B:C
, A:B:C:D
, etc.) and at arbitrary levels of the JSON ("A:B:C": { "D:E": { ... } }
)?
2
Answers
Break up the document into a stream of key-value pairs using
tostream
, while discarding back-tracking items byselect
ing only ones having a value (at position1
). Then, re-arrange the path arrays byjoin
ing and splitting them again by a colon. Eventually, re-construct the output object usingsetpath
, doing it for each item of the outer array in amap
.Demo
Using
unroll_keys
as defined below gives quite a lot of flexibility.To unroll the keys of all objects, you could, for example, then use
walk
as shown below: