skip to Main Content

today i have a modification from another question that i did yesterday.
The challenge is: How convert a CSV into a hierarchical json but the keys and values are different. It must be done with PYTHON (pandas and json library)

So i have the next CSV file:

TABLE_NAME,TIPO_CARGA,COLUMN_NAME,DATA_TYPE
CUENTAS,Full load,PERFIL,NVARCHAR2
CUENTAS,Full load,LLAMADA_INI,DATE
CUENTAS,Full load,LLAMADA_END,DATE
CUENTAS_PLAN,Delta,REGISTRO,NVARCHAR2
CUENTAS_PLAN,Delta,ULTIMA_VISITA,DATE
CUENTAS_PLAN,Delta,FECHA,DATE
CUENTAS_TACTICAS,Delta,CIUDAD,NVARCHAR2
CUENTAS_TACTICAS,Delta,SALIDA,NVARCHAR2
CUENTAS_TACTICAS,Delta,COMPLETADO,DATE
CUENTAS_MIEMBROS,Full load,NEGOCIO,NVARCHAR2
CUENTAS_MIEMBROS,Full load,CORE,NVARCHAR2
CUENTAS_MIEMBROS,Full load,DESC,NVARCHAR2

The result must be something like this:

{
    "schema_name": {
        "CUENTAS": {
            "LOAD_TYPE": "Full load",
            "FIELDS": {
                "PERFIL": "NVARCHAR2",
                "LLAMADA_INI": "DATE",
                "LLAMADA_END": "DATE"
            }
        },
        "CUENTAS_PLAN": {
            "LOAD_TYPE": "Delta",
            "FIELDS": {
                "ULTIMA_VISITA": "DATE",
                "FECHA": "DATE",
                "CIUDAD": "NVARCHAR2"
            }
        },
        "CUENTAS_TACTICAS": {
            "LOAD_TYPE": "Delta",
            "FIELDS": {
                "CIUDAD": "NVARCHAR2",
                "SALIDA": "NVARCHAR2",
                "COMPLETADO": "DATE"
            }
        }
        {...}
    }
}

I prepare a python code to create a normal json structure like "key":"value", where key is a normalize text, and the value is the information from the CSV… but this does not meet the requirements.

Can somebody help me to create this python code?

Regards

2

Answers


  1. Use groupby and a dictionary comprehension:

    out = json.dumps({
        'schema_name': {
            n: {'LOAD_TYPE': l, 'FIELDS': g.to_dict()}
            for (n, l), g in df.set_index('COLUMN_NAME').groupby(
                ['TABLE_NAME', 'TIPO_CARGA']
            )['DATA_TYPE']
        }
    }, indent=2)
    

    Output:

    {
      "schema_name": {
        "CUENTAS": {
          "LOAD_TYPE": "Full load",
          "FIELDS": {
            "PERFIL": "NVARCHAR2",
            "LLAMADA_INI": "DATE",
            "LLAMADA_END": "DATE"
          }
        },
        "CUENTAS_MIEMBROS": {
          "LOAD_TYPE": "Full load",
          "FIELDS": {
            "NEGOCIO": "NVARCHAR2",
            "CORE": "NVARCHAR2",
            "DESC": "NVARCHAR2"
          }
        },
        "CUENTAS_PLAN": {
          "LOAD_TYPE": "Delta",
          "FIELDS": {
            "REGISTRO": "NVARCHAR2",
            "ULTIMA_VISITA": "DATE",
            "FECHA": "DATE"
          }
        },
        "CUENTAS_TACTICAS": {
          "LOAD_TYPE": "Delta",
          "FIELDS": {
            "CIUDAD": "NVARCHAR2",
            "SALIDA": "NVARCHAR2",
            "COMPLETADO": "DATE"
          }
        }
      }
    }
    
    Login or Signup to reply.
  2. Use nested dictionary comprehension with convert last columns to dictionary:

    import json
    
    out = (json.dumps({"schema_name": {n: {'LOAD_TYPE':l,'FIELDS': dict(g.to_numpy())}
                     for (n,l), g in 
                     df.groupby(['TABLE_NAME','TIPO_CARGA'], 
                                sort=False)[['COLUMN_NAME','DATA_TYPE']]}}))
    
    print (out)
    {
        "schema_name": {
            "CUENTAS": {
                "LOAD_TYPE": "Full load",
                "FIELDS": {
                    "PERFIL": "NVARCHAR2",
                    "LLAMADA_INI": "DATE",
                    "LLAMADA_END": "DATE"
                }
            },
            "CUENTAS_PLAN": {
                "LOAD_TYPE": "Delta",
                "FIELDS": {
                    "REGISTRO": "NVARCHAR2",
                    "ULTIMA_VISITA": "DATE",
                    "FECHA": "DATE"
                }
            }...
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search