Suppose I have the following JSON document (inspired by this post)
Initial document
{
"key": "value",
"ips": [
{
"ip": "1.2.3.4",
"macAddress": "ac:5f:3e:87:d7:1a"
},
{
"ip": "5.6.7.8",
"macAddress": "ac:5f:3e:87:d7:2a"
},
{
"ip": "9.10.11.12",
"macAddress": "ac:5f:3e:87:d7:3a"
},
{
"ip": "13.14.15.16",
"macAddress": "42:12:20:2e:2b:ca"
}
]
}
Now I would like to read every macAddress
pass it to a hash function (e.g. md5sum
) and write the result back to the JSON document.
Desired output
{
"key": "value",
"ips": [
{
"ip": "1.2.3.4",
"macAddress": "45ee585278a0717c642ff2cb25a8e441"
},
{
"ip": "5.6.7.8",
"macAddress": "ab47bf90cb9f385127977569e676ce70"
},
{
"ip": "9.10.11.12",
"macAddress": "a5e9785db428e3956a47776dbd00fc91"
},
{
"ip": "13.14.15.16",
"macAddress": "f75d61937f70252ff139adee241daab4"
}
]
}
Currently I’ve the following shell script, but I think it can be done more elegantly…preferably in a one-liner.
json_doc="{"key": "value", "ips": [{"ip":"1.2.3.4","macAddress":"ac:5f:3e:87:d7:1a"},{"ip":"5.6.7.8","macAddress":"ac:5f:3e:87:d7:2a"},{"ip":"9.10.11.12","macAddress":"ac:5f:3e:87:d7:3a"},{"ip":"13.14.15.16","macAddress":"42:12:20:2e:2b:ca"}]}"
ip_list=$(jq -c '.ips[]' <<< "$json_doc" |
while read -r jsonline ; do
hashmac="$(jq -s -j '.[] | .macAddress' <<<"$jsonline" | md5sum | cut -d ' ' -f1)"
jq --arg hashmac "$hashmac" -s -r '.[] | .macAddress |= "($hashmac)"' <<<"$jsonline"
done | jq -s)
# Update json document with ip list containing hashed mac addresses
jq --argjson ips "$ip_list" '.ips = $ips' <<<"$json_doc"
2
Answers
A variation of peak’s answer from the linked question. Two invocations of
jq
, first for calculating the md5 hashes and then re-construct the calculated result back into the original JSON usingreduce
The second
jq
invocation should be read carefully. The initial arguments-s -R
are for reading the multi-line non JSON output created by the for-loop into jq’s context. While the--slurpfile
argument is needed for updating back the calculated hash into the original JSON. The slurp action takes the whole file into memory.So as such this command, might not be effective for really large JSON files.
Another approach could be to use
jq
to decompose the JSON into lines of scalars, then filter for and process relevant lines outside ofjq
, and eventually reassemble that stream with a second call tojq
.Here’s one example using
jq
‘s stream representation for the broken-down interstage, i.e.jq -c . --stream
for the decomposition, andjq -n 'fromstream(inputs)'
for the reassembly, andawk
for the actual processing, as it can easily read and filter by lines, alter parts of it, and shell out to perform external tasks. To filter for lines like[["ips",0,"macAddress"],"ac:5f:3e:87:d7:1a"]
while waving through others like[["ips",0,"ip"],"1.2.3.4"]
or[["ips",0,"macAddress"]]
, a simple approach could be to interpret each line as columns separated by double quotes"
, then filter for columns 2 and 4 matching a given content, and column 6 not being empty (which could obviously improved for robustness; this is just an example), then replace column 6 (usinggetline
) with the output ofprintf %s
on the 6th column’s value, followed by yourmd5sum
andcut
processing. (Tested with onetrueawk/awk version 20231124, and GNU Awk 5.3.0.)Here’s another, more robust example that "manually" decomposes the input into value and path, while also appending a flag to mark the scalars that need to be further processed (queried by the
jq
path expression.ips[].macAddress
), into lines like"1.2.3.4" ["ips",0,"ip"] false
or"ac:5f:3e:87:d7:1a" ["ips",0,"macAddress"] true
. The processing part of this example then only utilizes POSIX-compliant shell features likeread
to iterate through the lines, acase
statement to deflect based on that flag, andtr
and parameter expansion to extract the IP (which is assumed to not contain spaces or escapes). The finaljq
composer then collects the lines usingreduce
, and successively builds up the output usingsetpath
.For the given input, both examples output