skip to Main Content

I have a txt file with values obtained by calling the following command recursively: gsutil ls -r gs://bucket-test/** | while IFS= read -r key; do gsutil stat $key; done, it looks like this:

gs://bucket-test/4e123978-8eed-43ae-f521-8fba54c704ea.zip:
    Creation time:          Wed, 21 Dec 2022 10:39:27 GMT
    Update time:            Wed, 21 Dec 2022 10:39:27 GMT
    Storage class:          STANDARD
    Content-Length:         0
    Content-Type:           application/zip
    Hash (crc32c):          AAAAAA==
    Hash (md5):             1B2M2Y8AsgTpgAmY7PhCfg==
    ETag:                   CM30q9XCivwCEAE=
    Generation:             1671619167320653
    Metageneration:         1
gs://bucket-test/GKiSQMZ5rAqrSWwur/uploads/GENERAL/SNrQD97nzQN9eDLeA/AAZYefiL5CT8pxe4L:
    Creation time:          Mon, 10 Apr 2023 19:09:41 GMT
    Update time:            Mon, 10 Apr 2023 19:09:41 GMT
    Storage class:          STANDARD
    Content-Disposition:    inline; filename=James_INGREDIENTS_A3.pdf
    Content-Length:         4381797
    Content-Type:           application/pdf
    Hash (crc32c):          GOzitA==
    Hash (md5):             eUSLC/z70gjDB2WQKIPOuQ==
    ETag:                   CLGPvu+BoP4CEAE=
    Generation:             1681153781106609
    Metageneration:         1
gs://bucket-test/prova.pdf:
    Creation time:          Mon, 08 May 2023 15:37:26 GMT
    Update time:            Mon, 08 May 2023 15:40:12 GMT
    Storage class:          STANDARD
    Content-Disposition:    inline; filename=James_KEY_VISUAL_A3.pdf
    Content-Language:       ace
    Content-Length:         15407
    Content-Type:           application/pdf
    Metadata:               
        meta-1:             prova 1
        meta-2:             prova 2
    Hash (crc32c):          ZIrHPA==
    Hash (md5):             oZbD+S8y35spkNozW3hUDA==
    ETag:                   CNDj09OG5v4CEAM=
    Generation:             1683560246604240
    Metageneration:         3

I need to convert the output to json format, splitting by leading spaces and assigning the value present on the first row of each group to the "Key" field, then there may be subfields for example under the "Metadata" value:

{
  "Key": "gs://bucket-test/4e123978-8eed-43ae-f521-8fba54c704ea.zip",
  "Creation time": "Wed, 21 Dec 2022 10:39:27 GMT",
  "Update time": "Wed, 21 Dec 2022 10:39:27 GMT",
  "Storage class": "STANDARD",
  "Content-Length": "0",
  "Content-Type": "application/zip",
  "Hash (crc32c)": "AAAAAA==",
  "Hash (md5)": "1B2M2Y8AsgTpgAmY7PhCfg==",
  "ETag": "CM30q9XCivwCEAE=",
  "Generation": "1671619167320653",
  "Metageneration": "1"
},
{
  "Key": "gs://bucket-test/GKiSQMZ5rAqrSWwur/uploads/GENERAL/SNrQD97nzQN9eDLeA/AAZYefiL5CT8pxe4L",
  "Creation time": "Mon, 10 Apr 2023 19:09:41 GMT",
  "Update time": "Mon, 10 Apr 2023 19:09:41 GMT",
  "Storage class": "STANDARD",
  "Content-Disposition": "inline; filename=James_INGREDIENTS_A3.pdf",
  "Content-Length": "4381797",
  "Content-Type": "application/pdf",
  "Hash (crc32c)": "GOzitA==",
  "Hash (md5)": "eUSLC/z70gjDB2WQKIPOuQ==",
  "ETag": "CLGPvu+BoP4CEAE=",
  "Generation": "1681153781106609",
  "Metageneration": "1"
},
{
  "Key": "gs://bucket-test/prova.pdf",
  "Creation time": "Mon, 08 May 2023 15:37:26 GMT",
  "Update time": "Mon, 08 May 2023 15:40:12 GMT",
  "Storage class": "STANDARD",
  "Content-Disposition": "inline; filename=James_KEY_VISUAL_A3.pdf",
  "Content-Language": "ace",
  "Content-Length": "15407",
  "Content-Type": "application/pdf",
  "Metadata": {
    "meta-1": "prova 1",
    "meta-2": "prova 2"
  },
  "Hash (crc32c)": "ZIrHPA==",
  "Hash (md5)": "oZbD+S8y35spkNozW3hUDA==",
  "ETag": "CNDj09OG5v4CEAM=",
  "Generation": "1683560246604240",
  "Metageneration": "3"
}

I tried with this command for an only group but without success:
gsutil stat gs://bucket-test/prova.pdf | printf %s "$(cat)" | jq -R -s 'split("n") | map({key: split(": ")[0], value: split(": ")[1]})'

The json is converted into an array:

[
  {
    "key": "gs://spin8-test/prova.pdf:",
    "value": null
  },
  {
    "key": "    Creation time",
    "value": "         Mon, 08 May 2023 15:37:26 GMT"
  },
  {
    "key": "    Update time",
    "value": "           Mon, 08 May 2023 15:40:12 GMT"
  },
  {
    "key": "    Storage class",
    "value": "         STANDARD"
  },
  {
    "key": "    Content-Disposition",
    "value": "   inline; filename=James_KEY_VISUAL_A3.pdf"
  },
  {
    "key": "    Content-Language",
    "value": "      ace"
  },
  {
    "key": "    Content-Length",
    "value": "        15407"
  },
  {
    "key": "    Content-Type",
    "value": "          application/pdf"
  },
  {
    "key": "    Metadata",
    "value": "              "
  },
  {
    "key": "        meta-1",
    "value": "            prova 1"
  },
  {
    "key": "        meta-2",
    "value": "            prova 2"
  },
  {
    "key": "    Hash (crc32c)",
    "value": "         ZIrHPA=="
  },
  {
    "key": "    Hash (md5)",
    "value": "            oZbD+S8y35spkNozW3hUDA=="
  },
  {
    "key": "    ETag",
    "value": "                  CNDj09OG5v4CEAM="
  },
  {
    "key": "    Generation",
    "value": "            1683560246604240"
  },
  {
    "key": "    Metageneration",
    "value": "        3"
  }
]

Any suggestions? Thanks

2

Answers


  1. I’ve no idea how you’d use jq to parse regular text to turn it into JSON or if that’s even something jq is designed to do but here’s a start using awk to just handle the input/output you show:

    $ cat tst.awk
    /^[^[:space:]]/ { if (NR>1) prtRec() }
    { rec = rec $0 RS }
    END { prtRec(); print "" }
    
    function prtRec(        lines,numLines,lineNr,line,tag,val,depth) {
        printf "%s", recSep
        recSep = "," ORS
    
        print "{"
    
        numLines = split(rec,lines,RS) - 1
        for ( lineNr=1; lineNr<=numLines; lineNr++ ) {
            line = lines[lineNr]
            gsub(/^[[:space:]]+|[[:space:]]+$/,"",line)
            tag = val = line
            if ( lineNr == 1 ) {
                tag = "Key"
                sub(/:$/,"",val)
            }
            else {
                sub(/[[:space:]]*:.*/,"",tag)
                sub(/[^:]+:[[:space:]]*/,"",val)
            }
            printf "  "%s": "%s"%sn", tag, val, (lineNr<numLines ? "," : "")
        }
    
        printf "}"
    
        rec = ""
    }
    

    $ awk -f tst.awk file
    {
      "Key": "gs://bucket-test/4e123978-8eed-43ae-f521-8fba54c704ea.zip",
      "Creation time": "Wed, 21 Dec 2022 10:39:27 GMT",
      "Update time": "Wed, 21 Dec 2022 10:39:27 GMT",
      "Storage class": "STANDARD",
      "Content-Length": "0",
      "Content-Type": "application/zip",
      "Hash (crc32c)": "AAAAAA==",
      "Hash (md5)": "1B2M2Y8AsgTpgAmY7PhCfg==",
      "ETag": "CM30q9XCivwCEAE=",
      "Generation": "1671619167320653",
      "Metageneration": "1"
    },
    {
      "Key": "gs://bucket-test/GKiSQMZ5rAqrSWwur/uploads/GENERAL/SNrQD97nzQN9eDLeA/AAZYefiL5CT8pxe4L",
      "Creation time": "Mon, 10 Apr 2023 19:09:41 GMT",
      "Update time": "Mon, 10 Apr 2023 19:09:41 GMT",
      "Storage class": "STANDARD",
      "Content-Disposition": "inline; filename=James_INGREDIENTS_A3.pdf",
      "Content-Length": "4381797",
      "Content-Type": "application/pdf",
      "Hash (crc32c)": "GOzitA==",
      "Hash (md5)": "eUSLC/z70gjDB2WQKIPOuQ==",
      "ETag": "CLGPvu+BoP4CEAE=",
      "Generation": "1681153781106609",
      "Metageneration": "1"
    },
    {
      "Key": "gs://bucket-test/prova.pdf",
      "Creation time": "Mon, 08 May 2023 15:37:26 GMT",
      "Update time": "Mon, 08 May 2023 15:40:12 GMT",
      "Storage class": "STANDARD",
      "Content-Disposition": "inline; filename=James_KEY_VISUAL_A3.pdf",
      "Content-Language": "ace",
      "Content-Length": "15407",
      "Content-Type": "application/pdf",
      "Metadata": "",
      "meta-1": "prova 1",
      "meta-2": "prova 2",
      "Hash (crc32c)": "ZIrHPA==",
      "Hash (md5)": "oZbD+S8y35spkNozW3hUDA==",
      "ETag": "CNDj09OG5v4CEAM=",
      "Generation": "1683560246604240",
      "Metageneration": "3"
    }
    

    You’d just have to modify it to spot the missing val on the Metadata line and use the increase/decrease of the indent on the subsequent lines to add the necessary additional { and }.

    Login or Signup to reply.
  2. With jq, you can read in raw text using the -R flag, and iterate through the lines using reduce. Start out with an empty array [], then, based on the indentation, add a new item, append to the last one, or append to last one’s .Metadata field. Checking the indentation and parsing the line’s content is done using regular expressions with match and capture, respectively:

    jq -Rn '
      reduce (inputs | {
        ind: match("^\s*").length,
        cap: capture("\s*(?<key>.*):(\s+(?<value>.*))?$")
      }) as {$ind, $cap} ([];
        if $ind == 0 then . + [$cap | {key}]
        elif $ind == 4 then last += ([$cap | select(.key == "Metadata").value = {}] | from_entries)
        elif $ind == 8 then last.Metadata += ([$cap] | from_entries)
        else . end
      )
    '
    

    This creates a valid JSON array (because without the brackets but with commas in between the items, it wouldn’t be valid JSON):

    [
      {
        "key": "gs://bucket-test/4e123978-8eed-43ae-f521-8fba54c704ea.zip",
        "Creation time": "Wed, 21 Dec 2022 10:39:27 GMT",
        "Update time": "Wed, 21 Dec 2022 10:39:27 GMT",
        "Storage class": "STANDARD",
        "Content-Length": "0",
        "Content-Type": "application/zip",
        "Hash (crc32c)": "AAAAAA==",
        "Hash (md5)": "1B2M2Y8AsgTpgAmY7PhCfg==",
        "ETag": "CM30q9XCivwCEAE=",
        "Generation": "1671619167320653",
        "Metageneration": "1"
      },
      {
        "key": "gs://bucket-test/GKiSQMZ5rAqrSWwur/uploads/GENERAL/SNrQD97nzQN9eDLeA/AAZYefiL5CT8pxe4L",
        "Creation time": "Mon, 10 Apr 2023 19:09:41 GMT",
        "Update time": "Mon, 10 Apr 2023 19:09:41 GMT",
        "Storage class": "STANDARD",
        "Content-Disposition": "inline; filename=James_INGREDIENTS_A3.pdf",
        "Content-Length": "4381797",
        "Content-Type": "application/pdf",
        "Hash (crc32c)": "GOzitA==",
        "Hash (md5)": "eUSLC/z70gjDB2WQKIPOuQ==",
        "ETag": "CLGPvu+BoP4CEAE=",
        "Generation": "1681153781106609",
        "Metageneration": "1"
      },
      {
        "key": "gs://bucket-test/prova.pdf",
        "Creation time": "Mon, 08 May 2023 15:37:26 GMT",
        "Update time": "Mon, 08 May 2023 15:40:12 GMT",
        "Storage class": "STANDARD",
        "Content-Disposition": "inline; filename=James_KEY_VISUAL_A3.pdf",
        "Content-Language": "ace",
        "Content-Length": "15407",
        "Content-Type": "application/pdf",
        "Metadata": {
          "meta-1": "prova 1",
          "meta-2": "prova 2"
        },
        "Hash (crc32c)": "ZIrHPA==",
        "Hash (md5)": "oZbD+S8y35spkNozW3hUDA==",
        "ETag": "CNDj09OG5v4CEAM=",
        "Generation": "1683560246604240",
        "Metageneration": "3"
      }
    ]
    

    Demo

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search