I currently have one json file as follows in terms of formatting:
{
"Comment":"json data",
"Changes":[
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record1",
"Type":"CNAME",
"SetIdentifier":"record1-ap-northeast",
"GeoLocation":{
"CountryCode":"JP"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record1"
}
],
"HealthCheckId":"ID"
}
},
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record2",
"Type":"CNAME",
"SetIdentifier":"record2-ap-south",
"GeoLocation":{
"CountryCode":"SG"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record2"
}
],
"HealthCheckId":"ID"
}
},
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record3",
"Type":"CNAME",
"SetIdentifier":"record3-ap-west",
"GeoLocation":{
"CountryCode":"IN"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record3"
}
],
"HealthCheckId":"ID"
}
},
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record4.",
"Type":"CNAME",
"SetIdentifier":"record4",
"GeoLocation":{
"CountryCode":"*"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record4-ap-west"
}
],
"HealthCheckId":"ID"
}
}
]
}
The original file has 20000 such values for the "Changes key". I want to create a file with 830 values in each file and create as many files as it creates. In order to achieve this I need it in the below format.
{
"Comment":"json data",
"Changes":[
{
"Action":"DELETE",
"ResourceRecordSet":{
"Name":"record4.",
"Type":"CNAME",
"SetIdentifier":"record4", #830 such arrays in each file
"GeoLocation":{
"CountryCode":"*"
},
"TTL":60,
"ResourceRecords":[
{
"Value":"record4-ap-west"
}
],
"HealthCheckId":"ID"
}
}
]
}
I’ve created the below shell script to do this
#!/bin/bash
# Set the input file name
input_file="input.json"
# Set the output file prefix
output_file_prefix="output"
# Set the number of objects per output file
objects_per_file=830
# Skip the first two lines of the input file
tail -n +3 "$input_file" > temp.json
# Get the total number of lines in the input file
total_lines=$(wc -l < temp.json)
# Calculate the number of output files needed
output_files=$(((total_lines + objects_per_file - 1) / objects_per_file))
# Split the input file into multiple output files
split -l $objects_per_file temp.json "$output_file_prefix"
# Loop through each output file and add the opening and closing square brackets
for file in "$output_file_prefix"*; do
echo "[" > "$file".json
cat "$file" >> "$file".json
echo "]" >> "$file".json
rm "$file"
done
# Remove the temporary file
rm temp.json
**By using this I am getting the output as expected but it is broken as it is considering 830 lines but not 830 arrays. **
Format:
#start of file
[
{
"Action": "DELETE",
"ResourceRecordSet":
{
"Name": "record1",
"Type": "CNAME",
"SetIdentifier": "record1-ap-northeast",
"GeoLocation": {
"CountryCode": "JP"
},
"TTL": 60,
"ResourceRecords": [
{
"Value": "record1"
}
],
"HealthCheckId": "ID"
}
},
#end of file
{
"Action": "DELETE",
"ResourceRecordSet":
{
"Action": "DELETE",
"ResourceRecordSet":
{
"Name": "record4.",
"Type": "CNAME",
"SetIdentifier": "record4"
]
How do I achieve the required result. Due to the character limitation I cannot use more than 830 such arrays in each file?
I tried using jq tool to achieve this but I am completely new with it. Could you please help me with this?
2
Answers
If you wish to use jq, you will have to do it in two or three steps. Each step, however, is very easy.
The first step uses jq with the -c option to create a JSONLines file with the JSON objects you want:
Next, partition output.jsonl into the files you want. This can be done in many ways, e.g. using awk, or even the shell’s
read
.Finally, if you want the separate files to be "pretty-printed", you could use jq to do that in the obvious way.
so even assuming some of the
JSON
"rows" are compacted into 1 line (e.g. equivalent ofjq -c
) while others are pretty-printed in a tree format, then all you need is the rightregex
inawk
to identify its row delimiter/sep ("RS"
) :then once you’ve been able to isolate individual
"Change Key"
records, then outputing every 830 rows should be relative straight-forward.you can pipe the output of that further downstream to confirm the output is valid
JSON
via :as long as the input structure is very well defined, then
awk
can handleJSON
s just fine instead of needing a dedicated parser.