I have a txt file with values obtained by calling the following command recursively: gsutil ls -r gs://bucket-test/** | while IFS= read -r key; do gsutil stat $key; done
, it looks like this:
gs://bucket-test/4e123978-8eed-43ae-f521-8fba54c704ea.zip:
Creation time: Wed, 21 Dec 2022 10:39:27 GMT
Update time: Wed, 21 Dec 2022 10:39:27 GMT
Storage class: STANDARD
Content-Length: 0
Content-Type: application/zip
Hash (crc32c): AAAAAA==
Hash (md5): 1B2M2Y8AsgTpgAmY7PhCfg==
ETag: CM30q9XCivwCEAE=
Generation: 1671619167320653
Metageneration: 1
gs://bucket-test/GKiSQMZ5rAqrSWwur/uploads/GENERAL/SNrQD97nzQN9eDLeA/AAZYefiL5CT8pxe4L:
Creation time: Mon, 10 Apr 2023 19:09:41 GMT
Update time: Mon, 10 Apr 2023 19:09:41 GMT
Storage class: STANDARD
Content-Disposition: inline; filename=James_INGREDIENTS_A3.pdf
Content-Length: 4381797
Content-Type: application/pdf
Hash (crc32c): GOzitA==
Hash (md5): eUSLC/z70gjDB2WQKIPOuQ==
ETag: CLGPvu+BoP4CEAE=
Generation: 1681153781106609
Metageneration: 1
gs://bucket-test/prova.pdf:
Creation time: Mon, 08 May 2023 15:37:26 GMT
Update time: Mon, 08 May 2023 15:40:12 GMT
Storage class: STANDARD
Content-Disposition: inline; filename=James_KEY_VISUAL_A3.pdf
Content-Language: ace
Content-Length: 15407
Content-Type: application/pdf
Metadata:
meta-1: prova 1
meta-2: prova 2
Hash (crc32c): ZIrHPA==
Hash (md5): oZbD+S8y35spkNozW3hUDA==
ETag: CNDj09OG5v4CEAM=
Generation: 1683560246604240
Metageneration: 3
I need to convert the output to json format, splitting by leading spaces and assigning the value present on the first row of each group to the "Key" field, then there may be subfields for example under the "Metadata" value:
{
"Key": "gs://bucket-test/4e123978-8eed-43ae-f521-8fba54c704ea.zip",
"Creation time": "Wed, 21 Dec 2022 10:39:27 GMT",
"Update time": "Wed, 21 Dec 2022 10:39:27 GMT",
"Storage class": "STANDARD",
"Content-Length": "0",
"Content-Type": "application/zip",
"Hash (crc32c)": "AAAAAA==",
"Hash (md5)": "1B2M2Y8AsgTpgAmY7PhCfg==",
"ETag": "CM30q9XCivwCEAE=",
"Generation": "1671619167320653",
"Metageneration": "1"
},
{
"Key": "gs://bucket-test/GKiSQMZ5rAqrSWwur/uploads/GENERAL/SNrQD97nzQN9eDLeA/AAZYefiL5CT8pxe4L",
"Creation time": "Mon, 10 Apr 2023 19:09:41 GMT",
"Update time": "Mon, 10 Apr 2023 19:09:41 GMT",
"Storage class": "STANDARD",
"Content-Disposition": "inline; filename=James_INGREDIENTS_A3.pdf",
"Content-Length": "4381797",
"Content-Type": "application/pdf",
"Hash (crc32c)": "GOzitA==",
"Hash (md5)": "eUSLC/z70gjDB2WQKIPOuQ==",
"ETag": "CLGPvu+BoP4CEAE=",
"Generation": "1681153781106609",
"Metageneration": "1"
},
{
"Key": "gs://bucket-test/prova.pdf",
"Creation time": "Mon, 08 May 2023 15:37:26 GMT",
"Update time": "Mon, 08 May 2023 15:40:12 GMT",
"Storage class": "STANDARD",
"Content-Disposition": "inline; filename=James_KEY_VISUAL_A3.pdf",
"Content-Language": "ace",
"Content-Length": "15407",
"Content-Type": "application/pdf",
"Metadata": {
"meta-1": "prova 1",
"meta-2": "prova 2"
},
"Hash (crc32c)": "ZIrHPA==",
"Hash (md5)": "oZbD+S8y35spkNozW3hUDA==",
"ETag": "CNDj09OG5v4CEAM=",
"Generation": "1683560246604240",
"Metageneration": "3"
}
I tried with this command for an only group but without success:
gsutil stat gs://bucket-test/prova.pdf | printf %s "$(cat)" | jq -R -s 'split("n") | map({key: split(": ")[0], value: split(": ")[1]})'
The json is converted into an array:
[
{
"key": "gs://spin8-test/prova.pdf:",
"value": null
},
{
"key": " Creation time",
"value": " Mon, 08 May 2023 15:37:26 GMT"
},
{
"key": " Update time",
"value": " Mon, 08 May 2023 15:40:12 GMT"
},
{
"key": " Storage class",
"value": " STANDARD"
},
{
"key": " Content-Disposition",
"value": " inline; filename=James_KEY_VISUAL_A3.pdf"
},
{
"key": " Content-Language",
"value": " ace"
},
{
"key": " Content-Length",
"value": " 15407"
},
{
"key": " Content-Type",
"value": " application/pdf"
},
{
"key": " Metadata",
"value": " "
},
{
"key": " meta-1",
"value": " prova 1"
},
{
"key": " meta-2",
"value": " prova 2"
},
{
"key": " Hash (crc32c)",
"value": " ZIrHPA=="
},
{
"key": " Hash (md5)",
"value": " oZbD+S8y35spkNozW3hUDA=="
},
{
"key": " ETag",
"value": " CNDj09OG5v4CEAM="
},
{
"key": " Generation",
"value": " 1683560246604240"
},
{
"key": " Metageneration",
"value": " 3"
}
]
Any suggestions? Thanks
2
Answers
I’ve no idea how you’d use
jq
to parse regular text to turn it into JSON or if that’s even somethingjq
is designed to do but here’s a start usingawk
to just handle the input/output you show:You’d just have to modify it to spot the missing
val
on theMetadata
line and use the increase/decrease of the indent on the subsequent lines to add the necessary additional{
and}
.With jq, you can read in raw text using the
-R
flag, and iterate through the lines usingreduce
. Start out with an empty array[]
, then, based on the indentation, add a new item, append to thelast
one, or append tolast
one’s.Metadata
field. Checking the indentation and parsing the line’s content is done using regular expressions withmatch
andcapture
, respectively:This creates a valid JSON array (because without the brackets but with commas in between the items, it wouldn’t be valid JSON):
Demo