skip to Main Content

I have an json file containing 1 array of thousand documents.
I need to convert file to csv format with specific values only.

Each document can have another combination of fields or sub-documents or, (and this is what makes me an problem), sometimes value of a field is array of subdocuments, in the other case it’s value is just subcdoc or field just does not exists.

My goal is to extract value of .meta.dev.devId (if field does not exists, then use empty column in csv.
Docs e.g. in the base array:

1st doc:
{ 
    "docId": "some_hex", 
    "bin": { 
        "field": "value", 
        "length": "123456"
    },
    "item": "item",
    "meta": { 
        "date": "dateFnc(123456789)",
        "name": "name",
        "dev": {
            "devId": "987654321",
            "devName": "name"
        },
    "type": "type1" 
    }
}
2nd doc:
{ 
    "docId": "some_hex", 
    "bin": { 
        "field": "value", 
        "length": "123456"
    },
    "item": "item",
    "meta": { 
        "date": "dateFnc(123456789)",
        "name": "name",
        "dev": [ {
            "devId": "987654321",
            "devName": "name"
        } ],
    "type": "type1" 
    }
}
3rd doc:
{ 
    "docId": "some_hex", 
    "bin": { 
        "field": "value", 
        "length": "123456"
    },
    "item": "item",
    "meta": { 
        "date": "dateFnc(123456789)",
        "name": "name",
    "type": "type1" 
    }
}

Command 1:
jq -r '.[] | [.docId, .item, .meta.date, .meta.name, .meta.dev[]?.devId] | @csv' file.json > file.csv

Result 1:
Cannot index string with string “devId”

Command 2:
jq -r '.[] | [.docId, .item, .meta.date, .meta.name, .meta.dev[].devId] | @csv' file.json > file.csv

Result 2:
Cannot iterate over null (null)

Command 3:
jq -r '.[] | [.docId, .item, .meta.date, .meta.name, .meta.dev.devId] | @csv' file.json > file.csv

Result 3:
Cannot index array with string “devId”

How I understand commands:

.[] | [.meta.dev.devId] → Expected value: “string”

.[] | [.meta.dev[].devId] → Expected value: [array]

.[] | [.meta.dev[]?.devId] → Expected value: [array] or not an array

2

Answers


  1. You can use an if-then-else-end on the type to map() over devId when dev is an array, or just use .devId if it’s not:

    .[] | [ .meta?.dev? | if type == "array" then map(.devId)[] else .devId? end ] | @csv
    

    Result with an array with your 3 input docs:

    ""987654321""
    ""987654321""
    ""
    

    JqPlay Demo
    Login or Signup to reply.
  2. You could traverse to .meta.dev, then filter by type using the shortcuts objects and arrays, combined with a type-specific action (e.g. extracting the first array item’s field with arrays[0].devId). If .meta.dev didn’t exist, it would yield null, so you could use the same technique to provide for a default value (e.g. nulls | "missing").

    < file.json > file.csv jq -r '.[] | [
      .docId, .item, .meta.date, .meta.name,
      (.meta.dev | objects.devId, arrays[0].devId, (nulls | "missing"))
    ] | @csv'
    

    Demo


    Alternatively [sic], you could use the Destructuring Alternative Operator ?// to provide for the different structures expected ([{$devId}] or {$devId}):

    < file.json > file.csv jq -r '.[] | [
      .docId, .item, .meta.date, .meta.name,
      (.meta.dev as [{$devId}] ?// {$devId} | $devId) // "missing"
    ] | @csv'
    

    Demo


    Contents of file.csv:

    "some_hex","item","dateFnc(123456789)","name","987654321"
    "some_hex","item","dateFnc(123456789)","name","987654321"
    "some_hex","item","dateFnc(123456789)","name","missing"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search