skip to Main Content

I have a bit of a thorny JSON manipulation problem. I have half a mind to just write a Python program to do it, but I’m wondering if a well-written jq query can solve it more elegantly — partly for a cleaner solution, and partly for pedagogic purposes. (I’m a jq noob and would love to take this opportunity to learn.)

I have the following JSON, printed from a tool whose output format I cannot modify:

[
  {
    "ExifTool:ExifTool:ExifTool": {
      "ExifToolVersion": 12.76
    },
    "SourceFile": "./_DSC5848.JPG",
    "File:System:Other": {
      "FileName": "_DSC5848.JPG",
      "Directory": ".",
      "FileSize": "82 kB",
      "FilePermissions": "-rw-r--r--"
    },
    "EXIF:ExifIFD:Camera": {
      "ExposureProgram": "Aperture-priority AE",
      "MaxApertureValue": 1.4,
      "Sharpness": "Normal"
    },
    "File:System:Time": {
      "FileModifyDate": "2024:09:24 14:10:16-07:00",
      "FileAccessDate": "2024:09:28 00:13:26-07:00",
      "FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
    },
    "EXIF:ExifIFD:Image": {
      "ExposureTime": "1/50",
      "FNumber": 4.0,
      "ISO": 200
    },
    ... additional arbitrary colon-keys ...
  },
  { ... },
  { ... },
  { ... },
  { ... }
]

I need the keys containing colons (I’ll call them “colon-keys”) to be recursively “unrolled” such that "A:B:C": { ... } becomes:

"A": {
  "B": {
    "C": { ... }
  }
}

Colon-keys with identical prefixes would be merged. For example, if there is also a colon-key "A:B:D": { ... }, the above would become:

"A": {
  "B": {
    "C": { ... },
    "D": { ... }
  }
}

Preserving the order of keys isn’t crucial, though it’d be cool if possible. It’s not known in advance what the names of the colon-keys will be, so hard-coding them unfortunately isn’t an option.

So to circle back to the example from the beginning of this post, the output would look like:

[
  {
    "ExifTool": {
      "ExifTool": {
        "ExifTool": {
          "ExifToolVersion": 12.76
        }
      }
    },
    "SourceFile": "./_DSC5848.JPG",
    "File": {
      "System": {
        "Other": {
          "FileName": "_DSC5848.JPG",
          "Directory": ".",
          "FileSize": "82 kB",
          "FilePermissions": "-rw-r--r--"
        },
        "Time": {
          "FileModifyDate": "2024:09:24 14:10:16-07:00",
          "FileAccessDate": "2024:09:28 00:13:26-07:00",
          "FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
        }
      }
    },
    "EXIF": {
      "ExifIFD": {
        "Camera": {
          "ExposureProgram": "Aperture-priority AE",
          "MaxApertureValue": 1.4,
          "Sharpness": "Normal"
        },
        "Image": {
          "ExposureTime": "1/50",
          "FNumber": 4.0,
          "ISO": 200
        }
      }
    }
  },
  { ... },
  { ... },
  { ... },
  { ... }
]

Is this possible to do with a well-written jq query, or is my only option a hand-rolled program?

Bonus, would such a query be able to handle colon-keys of arbitrary length (A:B, A:B:C, A:B:C:D, etc.) and at arbitrary levels of the JSON ("A:B:C": { "D:E": { ... } })?

2

Answers


  1. Break up the document into a stream of key-value pairs using tostream, while discarding back-tracking items by selecting only ones having a value (at position 1). Then, re-arrange the path arrays by joining and splitting them again by a colon. Eventually, re-construct the output object using setpath, doing it for each item of the outer array in a map.

    map(reduce (tostream | select(has(1))) as $i ({};
      setpath($i[0] | join(":") / ":"; $i[1])
    ))
    
    [
      {
        "ExifTool": {
          "ExifTool": {
            "ExifTool": {
              "ExifToolVersion": 12.76
            }
          }
        },
        "SourceFile": "./_DSC5848.JPG",
        "File": {
          "System": {
            "Other": {
              "FileName": "_DSC5848.JPG",
              "Directory": ".",
              "FileSize": "82 kB",
              "FilePermissions": "-rw-r--r--"
            },
            "Time": {
              "FileModifyDate": "2024:09:24 14:10:16-07:00",
              "FileAccessDate": "2024:09:28 00:13:26-07:00",
              "FileInodeChangeDate": "2024:09:25 23:26:20-07:00"
            }
          }
        },
        "EXIF": {
          "ExifIFD": {
            "Camera": {
              "ExposureProgram": "Aperture-priority AE",
              "MaxApertureValue": 1.4,
              "Sharpness": "Normal"
            },
            "Image": {
              "ExposureTime": "1/50",
              "FNumber": 4.0,
              "ISO": 200
            }
          }
        }
      }
    ]
    

    Demo

    Login or Signup to reply.
  2. Using unroll_keys as defined below gives quite a lot of flexibility.
    To unroll the keys of all objects, you could, for example, then use walk as shown below:

    # Assume the input is an object
    def unroll_keys:
      def unroll:
          .value as $value
          | .key | split(":") as $a
          | {} | setpath($a; $value);
      with_entries(if .key|test(":")
                   then unroll | to_entries[]
                   else .
                   end);
    
    walk(if type == "object" then unroll_keys else . end)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search