skip to Main Content

Suppose I have the following JSON document (inspired by this post)

Initial document

{
  "key": "value",
  "ips": [
    {
      "ip": "1.2.3.4",
      "macAddress": "ac:5f:3e:87:d7:1a"
    },
    {
      "ip": "5.6.7.8",
      "macAddress": "ac:5f:3e:87:d7:2a"
    },
    {
      "ip": "9.10.11.12",
      "macAddress": "ac:5f:3e:87:d7:3a"
    },
    {
      "ip": "13.14.15.16",
      "macAddress": "42:12:20:2e:2b:ca"
    }
  ]
}

Now I would like to read every macAddress pass it to a hash function (e.g. md5sum) and write the result back to the JSON document.

Desired output

{
  "key": "value",
  "ips": [
    {
      "ip": "1.2.3.4",
      "macAddress": "45ee585278a0717c642ff2cb25a8e441"
    },
    {
      "ip": "5.6.7.8",
      "macAddress": "ab47bf90cb9f385127977569e676ce70"
    },
    {
      "ip": "9.10.11.12",
      "macAddress": "a5e9785db428e3956a47776dbd00fc91"
    },
    {
      "ip": "13.14.15.16",
      "macAddress": "f75d61937f70252ff139adee241daab4"
    }
  ]
}

Currently I’ve the following shell script, but I think it can be done more elegantly…preferably in a one-liner.

json_doc="{"key": "value", "ips": [{"ip":"1.2.3.4","macAddress":"ac:5f:3e:87:d7:1a"},{"ip":"5.6.7.8","macAddress":"ac:5f:3e:87:d7:2a"},{"ip":"9.10.11.12","macAddress":"ac:5f:3e:87:d7:3a"},{"ip":"13.14.15.16","macAddress":"42:12:20:2e:2b:ca"}]}"

ip_list=$(jq -c '.ips[]' <<< "$json_doc" |
while read -r jsonline ; do
  hashmac="$(jq -s -j '.[] | .macAddress' <<<"$jsonline" | md5sum | cut -d ' ' -f1)"
  jq --arg hashmac "$hashmac" -s -r '.[] | .macAddress |= "($hashmac)"' <<<"$jsonline"
done | jq -s)

# Update json document with ip list containing hashed mac addresses
jq --argjson ips "$ip_list" '.ips = $ips' <<<"$json_doc"


2

Answers


  1. A variation of peak’s answer from the linked question. Two invocations of jq, first for calculating the md5 hashes and then re-construct the calculated result back into the original JSON using reduce

    jq -r '.ips[].macAddress' input.json |
    while read -r line ; do printf '%s' "$line" | md5 ; done |
    jq -s -R --slurpfile json input.json 'split("n")
      | map(select(length>0))
      | . as $in
      | reduce range(0;length) as $i ($json; .[].ips[$i].macAddress = $in[$i])'
    

    The second jq invocation should be read carefully. The initial arguments -s -R are for reading the multi-line non JSON output created by the for-loop into jq’s context. While the --slurpfile argument is needed for updating back the calculated hash into the original JSON. The slurp action takes the whole file into memory.

    So as such this command, might not be effective for really large JSON files.

    Login or Signup to reply.
  2. Another approach could be to use jq to decompose the JSON into lines of scalars, then filter for and process relevant lines outside of jq, and eventually reassemble that stream with a second call to jq.

    Here’s one example using jq‘s stream representation for the broken-down interstage, i.e. jq -c . --stream for the decomposition, and jq -n 'fromstream(inputs)' for the reassembly, and awk for the actual processing, as it can easily read and filter by lines, alter parts of it, and shell out to perform external tasks. To filter for lines like [["ips",0,"macAddress"],"ac:5f:3e:87:d7:1a"] while waving through others like [["ips",0,"ip"],"1.2.3.4"] or [["ips",0,"macAddress"]], a simple approach could be to interpret each line as columns separated by double quotes ", then filter for columns 2 and 4 matching a given content, and column 6 not being empty (which could obviously improved for robustness; this is just an example), then replace column 6 (using getline) with the output of printf %s on the 6th column’s value, followed by your md5sum and cut processing. (Tested with onetrueawk/awk version 20231124, and GNU Awk 5.3.0.)

    jq -c . --stream input.json | awk '
      BEGIN { FS = OFS = """ }
      $2 == "ips" && $4 == "macAddress" && $6 {
        "printf %s " $6 " | md5sum | cut -d \  -f1" | getline $6
      }
      1
    ' | jq -n 'fromstream(inputs)'
    

    Here’s another, more robust example that "manually" decomposes the input into value and path, while also appending a flag to mark the scalars that need to be further processed (queried by the jq path expression .ips[].macAddress), into lines like "1.2.3.4" ["ips",0,"ip"] false or "ac:5f:3e:87:d7:1a" ["ips",0,"macAddress"] true. The processing part of this example then only utilizes POSIX-compliant shell features like read to iterate through the lines, a case statement to deflect based on that flag, and tr and parameter expansion to extract the IP (which is assumed to not contain spaces or escapes). The final jq composer then collects the lines using reduce, and successively builds up the output using setpath.

    jq -r '
      [ path(.ips[].macAddress) ] as $q
      | paths(type | IN("object","array") | not) as $p
      | @json "(getpath($p)) ($p) (IN($p;$q[]))"
    ' input.json |
    
    while read -r line; do case "$line" in
      *false) printf '%sn' "$line" ;;
      *true) printf '"%s" %sn' "$(printf '%s' "${line%% *}" | tr -d '"' |
             md5sum | cut -d ' ' -f1)" "${line#* }" ;;
    esac; done |
    
    jq -s 'reduce _nwise(3) as [$v,$p] (null; setpath($p;$v))'
    

    For the given input, both examples output

    {
      "key": "value",
      "ips": [
        {
          "ip": "1.2.3.4",
          "macAddress": "45ee585278a0717c642ff2cb25a8e441"
        },
        {
          "ip": "5.6.7.8",
          "macAddress": "ab47bf90cb9f385127977569e676ce70"
        },
        {
          "ip": "9.10.11.12",
          "macAddress": "a5e9785db428e3956a47776dbd00fc91"
        },
        {
          "ip": "13.14.15.16",
          "macAddress": "f75d61937f70252ff139adee241daab4"
        }
      ]
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search