skip to Main Content

I have a use case where we have text file like key value format .
The file is not any of the fixed format but created like key value .

We need to create JSON out of that file .

I am able to create JSON but when text format has array like structure it creates just Key value json not the array json structure .

This is my Input .

[DOCUMENT]
Headline=This is Headline
MainLanguage=EN
DocType.MxpCode=1000
Subject[0].MxpCode=BUSNES
Subject[1].MxpCode=CONS
Subject[2].MxpCode=ECOF
Author[0].MxpCode=6VL6
Industry[0].CtbCode=53
Industry[1].CtbCode=5340
Industry[2].CtbCode=534030
Industry[3].CtbCode=53403050
Symbol[0].Name=EXPE.OQ
Symbol[1].Name=ABNB.OQ
WorldReg[0].CtbCode=G4
Country[0].CtbCode=G26
Country[1].CtbCode=G2V
[ENDOFFILE]

Exiting code to create json is below

with open("file1.csv") as f:
    lines = f.readlines()
data = {}
for line in lines:
    parts = line.split('=')
    if len(parts) == 2:
        data[parts[0].strip()] = parts[1].strip()
print(json.dumps(data, indent='  '))

The current output is below

{
  "Headline": "This is Headline",
  "MainLanguage": "EN",
  "DocType.MxpCode": "1000",
  "Subject[0].MxpCode": "BUSNES",
  "Subject[1].MxpCode": "CONS",
  "Subject[2].MxpCode": "ECOF",
  "Author[0].MxpCode": "6VL6",
  "Industry[0].CtbCode": "53",
  "Industry[1].CtbCode": "5340",
  "Industry[2].CtbCode": "534030",
  "Industry[3].CtbCode": "53403050",
  "Symbol[0].Name": "EXPE.OQ",
  "Symbol[1].Name": "ABNB.OQ",
  "WorldReg[0].CtbCode": "G4",
  "Country[0].CtbCode": "G26",
  "Country[1].CtbCode": "G2V"
}

Expected out is is something like below

For the Subject key and like wise for others also

{
  "subject": [
    {
      "mxcode": 123
    },
    {
      "mxcode": 123
    },
    {
      "mxcode": 123
    }
  ]
}

Like wise for Industry and Symbol and Country.

so the idea is when we have position in the text file it should be treated as array in the json output .

3

Answers


  1. Use one more loop as it is nested. Use for loop from where subject starts. try it that way.

    Login or Signup to reply.
  2. The code below is creating a dictionary, which is outputted to a JSON array.

    You could condense this code to your needs.

    import json
    import re as regex
    from collections import defaultdict
    
    keys = set()
    data  = defaultdict(dict)
    
    elements = [element.strip().split('=') for element in open("sample_file.text")]
    for element in elements:
        main_topic = ''.join([regex.split('[', item)[0] for item in element if '[' in item]).lower()
        sub_topic = ''.join([regex.split('.', item)[1] for item in element if '[' in item and len(element) == 2]).lower()
        if main_topic and main_topic in keys and sub_topic:
           data[main_topic][sub_topic].append(element[1].strip())
        elif main_topic and main_topic not in keys and sub_topic:
            keys.add(main_topic)
            data[main_topic][sub_topic] = [element[1].strip()]
        elif not main_topic and main_topic not in keys:
            if ']' not in element[0]:
                main_topic = element[0].split('.')[0].lower()
                if '.' not in element[0]:
                    data[main_topic] = element[1].lower().strip()
                elif '.' in element[0]:
                    sub_topic = element[0].split('.')[1].lower()
                    data[main_topic][sub_topic] = [element[1].strip()]
    
    print(json.dumps(data, indent='  '))
    

    JSON Output:

    {
      "headline": "This is Headline",
      "mainlanguage": "EN",
      "doctype": {
        "mxpcode": [
          "1000"
        ]
      },
      "subject": {
        "mxpcode": [
          "BUSNES",
          "CONS",
          "ECOF"
        ]
      },
      "author": {
        "mxpcode": [
          "6VL6"
        ]
      },
      "industry": {
        "ctbcode": [
          "53",
          "5340",
          "534030",
          "53403050"
        ]
      },
      "symbol": {
        "name": [
          "EXPE.OQ",
          "ABNB.OQ"
        ]
      },
      "worldreg": {
        "ctbcode": [
          "G4"
        ]
      },
      "country": {
        "ctbcode": [
          "G26",
          "G2V"
        ]
      }
    }
    
    
    Login or Signup to reply.
  3. import re
    import json
    
    
    with open("file.csv") as f:
        lines = f.readlines()
    data = {}
    for line in lines:
        parts = line.split('=')
        if len(parts) == 2:
            key, value = parts[0].strip(), parts[1].strip()
            # Use regex to check if key has position in its name
            match = re.match(r'^(w+)[(d+)].(w+)$', key)
            if match:
                array_key, index, obj_key = match.groups()
                # Create array of objects for the given key
                if array_key not in data:
                    data[array_key] = []
                # Create new object or update existing object at the given index
                if len(data[array_key]) <= int(index):
                    data[array_key].append({obj_key: value})
                else:
                    data[array_key][int(index)][obj_key] = value
            else:
                data[key] = value
    # Convert dictionary to JSON with proper indentation
    print(json.dumps(data, indent=2))

    Actually the code looks like the previous answer but I followed more your expected output with every subject industry etc to be set of dictionaries.

    {
      "Headline": "This is Headline",
      "MainLanguage": "EN",
      "DocType.MxpCode": "1000",
      "Subject": [
        {
          "MxpCode": "BUSNES"
        },
        {
          "MxpCode": "CONS"
        },
        {
          "MxpCode": "ECOF"
        }
      ],
      "Author": [
        {
          "MxpCode": "6VL6"
        }
      ],
      "Industry": [
        {
          "CtbCode": "53"
        },
        {
          "CtbCode": "5340"
        },
        {
          "CtbCode": "534030"
        },
        {
          "CtbCode": "53403050"
        }
      ],
      "Symbol": [
        {
          "Name": "EXPE.OQ"
        },
        {
          "Name": "ABNB.OQ"
        }
      ],
      "WorldReg": [
        {
          "CtbCode": "G4"
        }
      ],
      "Country": [
        {
          "CtbCode": "G26"
        },
        {
          "CtbCode": "G2V"
        }
      ]
    }
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search