skip to Main Content

I’m not really sure on how to phrase this question so lemme give an example: I have two types of JSON document formats for a large amount of files. Most of the contents apart from one object is irrelevant to me. I want to create a normalised version of each file. These are the two objects I care about (in each of the formats):

{
    "title": "Some data",
    "data": [
        {
            "id": "123",
            ...
        },
        {
            "id": "abc",
            ...
        }
    ]
}

and

{
  "title": "Some more data",
  "data": [
    {
      "ids": [
        {
          "id": "123",
          ...
        },
        {
          "id": "abc",
          ...
        }
      ],
      "names": [
        {
          "name": "A",
          ...
        },
        {
          "name": "B",
          ...
        }
      ]
    }
  ]
}

Each of those "object formats" is an object inside a JSON array in a file. I want to convert each of the files I have into a list of objects that captures the title, list of id and list of name in a single object:

{
  "title": "Some more data",
  "ids": [
      "123",
      "abc"
  ],
  "names": [
      "A",
      "B"
  ]
}

I use the following jq, but it doesn’t work (it creates multiple objects with the same title per name or id:

for f in $(find * -wholename "*.json" | sort); do
cat $f | jq '..
| if type == "object" then
    if has("data") then {
        "name": .title,
        "ids": (.data[] | [
            if has("id") then {
                "id": .id
            } else if has("ids") then {
                "ids": .ids[],
                "names": .names?
            } else null end
        end
    ])} else null end
else null end
| select(type != "null")' > "$f" ; done

EDIT: https://jqplay.org/s/uWC80Qoixxd.

2

Answers


  1. You could iterate over the outer array using .[], then construct the objects using ? // to provide alternatives if one evaluates to null.

    If you are okay with nulls in the comlpete absence of a key (as with .name in your first format), try this:

    .[] | {title} + (.data | {
      ids: map(.ids[]? // . | .id),
      names: map(.names[]? // . | .name)
    })
    
    {
      "title": "Some data",
      "ids": [
        "123",
        "abc"
      ],
      "names": [
        null,
        null
      ]
    }
    {
      "title": "Some more data",
      "ids": [
        "123",
        "abc"
      ],
      "names": [
        "A",
        "B"
      ]
    }
    

    Demo

    But you could also filter out nulls using values:

    .[] | {title} + (.data | {
      ids: map(.ids[]? // . | .id | values),
      names: map(.names[]? // . | .name | values)
    })
    
    {
      "title": "Some data",
      "ids": [
        "123",
        "abc"
      ],
      "names": []
    }
    {
      "title": "Some more data",
      "ids": [
        "123",
        "abc"
      ],
      "names": [
        "A",
        "B"
      ]
    }
    

    Demo

    If you want to get rid of keys with empty arrays altogether, filter them out using map_values on a comparison using select:

    .[] | {title} + (.data | {
      ids: map(.ids[]? // . | .id | values),
      names: map(.names[]? // . | .name | values)
    } | map_values(select(. != [])))
    
    {
      "title": "Some data",
      "ids": [
        "123",
        "abc"
      ]
    }
    {
      "title": "Some more data",
      "ids": [
        "123",
        "abc"
      ],
      "names": [
        "A",
        "B"
      ]
    }
    

    Demo


    Edit using the modified input files: As the deeper levels use the same (relative) path (here .specs[].spec), we need some other distinction criteria to rule out the level with "Some title you don’t care about". Checking for the presence of a .data key seems to fit with the new sample data.

    .specs[].spec | select(has("data")), .specs[]?.spec
    | {title} + (.data | {
      ids: map(.ids[]?.id // .i | values),
      names: map(.names[]? // . | .name | values)
    } | map_values(select(. != [])))
    
    {
      "title": "Some data",
      "ids": [
        "123",
        "abc"
      ]
    }
    {
      "title": "Some more data",
      "ids": [
        "123",
        "abc"
      ],
      "names": [
        "A",
        "B"
      ]
    }
    

    Demo

    Login or Signup to reply.
  2. If you are okay with having names: null or names: [] in the final document for your first example, the following looks like a simple solution:

    { title }
    + (.data | {
        ids: map(.ids[].id),
        names: (map(.names[].name)? // []) # or // null
    })
    

    or equivalent:

    {
        title,
        ids: (.data | map(.ids[].id)),
        names: (.data | map(.names[].name)? // [])
    }
    

    Output 1:

    {
      "title": "Some more data",
      "ids": [
        "123",
        "abc"
      ],
      "names": []
    }
    

    Output 2:

    {
      "title": "Some more data",
      "ids": [
        "123",
        "abc"
      ],
      "names": [
        "A",
        "B"
      ]
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search