skip to Main Content

I have a JSON structure like this:

{
"objectSchema": {
"fields": {
  "fieldArray": [
    {
      "a": "b",
      "c": "d"
    },
    {
      "x": 1,
      "y": "z"
    },
    {
      "x": 1,
      "y": "z"
    }
  ]
}
}
}

My current jq –

.objectSchema | def convertToSchema: if type == "array" then if length == 0 then {"type": "array", "items": {"type": "string"}} else {"type": "array", "items": (map(convertToSchema) | add)} end elif type == "object" then {"type": "object", "properties": (map_values(if type == "object" or type == "array" then convertToSchema else convertToSchema end))} elif type == "boolean" then {"type": "boolean"} elif type == "number" then {"type": "number"} elif type == "string" then {"type": "string"} elif type == "null" then {"type": "string"} else {"type": (type | tostring)} end; convertToSchema

My current output / converted JSON schema –

{
"type": "object",
"properties": {
"fields": {
  "type": "object",
  "properties": {
    "fieldArray": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "x": {
            "type": "number"
          },
          "y": {
            "type": "string"
          }
        }
      }
    }
  }
}
}
}

My desired output / JSON schema –

{
"type": "object",
"properties": {
"fields": {
  "type": "object",
  "properties": {
    "fieldArray": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "a": {
            "type": "string"
          },
          "c": {
            "type": "string"
          },
          "x": {
            "type": "number"
          },
          "y": {
            "type": "string"
          }
        }
      }
    }
  }
}
}
}

My current jq expression is only converting the last field of the array into the schema and not handling all nested array objects properly.

Please give me a proper jq expression that will handle above situations in all JSON nested object cases and ensure that each item in the array is considered and merged properly before applying the convertToSchema function and create a combined schema that includes all unique properties from all items in the array.

Providing JQPlay link with proper solution will be more helpful.

2

Answers


  1. There is a simple but generic "structural schema inference engine"
    at https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed (schema.jq).
    It produces a schema very similar to the one you describe (see below),
    but if that is not directly acceptable, you could either add a filter
    to convert it to the form you want, or tweak schema.jq itself.

    With schema.jq in the pwd ("."), and using your file as input.json:

    < input.json jq 'include "schema" {search: "."}; schema'
    

    produces:

    {
      "objectSchema": {
        "fields": {
          "fieldArray": [
            {
              "a": "string",
              "c": "string",
              "x": "number",
              "y": "string"
            }
          ]
        }
      }
    }
    
    

    Notes:

    1. schema.jq is compatible with both gojq and jaq (the Go and Rust implementations of jq), but at the time of writing, jaq does does support the include statement. You could however leave schema.jq untouched by writing something like:
    < input.json jaq -f <(cat schema.jq; echo schema) 
    
    1. Disclaimer: schema.jq was written by yours truly.
    Login or Signup to reply.
  2. Instead of add, you need to merge with * operator:

    jq '
    def convertToSchema:
      if type == "array" then
        if length == 0 then {"type": "array", "items": {"type": "string"}}
        else {"type": "array", "items":
              (reduce map(convertToSchema)[] as $i ({}; . *= $i))}
        end
      elif type == "object" then {
        "type": "object",
        "properties": map_values(convertToSchema)}
      elif type == "boolean" then {"type": "boolean"}
      elif type == "number" then {"type": "number"}
      elif type == "string" then {"type": "string"}
      elif type == "null" then {"type": "string"}
      else {"type": (type | tostring)} end;
    .objectSchema | convertToSchema
    ' input.json
    

    Output:

    {
      "type": "object",
      "properties": {
        "fields": {
          "type": "object",
          "properties": {
            "fieldArray": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "a": {
                    "type": "string"
                  },
                  "c": {
                    "type": "string"
                  },
                  "x": {
                    "type": "number"
                  },
                  "y": {
                    "type": "string"
                  }
                }
              }
            }
          }
        }
      }
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search