skip to Main Content

I am using python to convert the text input file to json.

My Code:

import json
import re

filename = "text.txt"
text = {}

pattern = re.compile(r's*([^=t]+)s*=s*(.*)')

with open(filename, encoding='utf8') as file:
    for line in file:
        match = pattern.match(line.strip())
        if match:
            key, value = match.groups()
            text[key] = value
        else:
            key_value = line.strip().rsplit(maxsplit=1)
            if len(key_value) == 2:
                key, value = key_value
                text[key] = value

with open("output.json", "w", encoding='utf-8') as output_file:
    json.dump(text, output_file, indent=4, ensure_ascii=False, sort_keys=False)

I am using regular expression for this operation. I am giving below as input:

I_KNO_DR=456
I_ff_DD=567
hello 23
hello world 34
Y=hi /// rtz 77

Now current output is as follows:

{
    "I_KNO_DR": "456",
    "I_ff_DD": "567",
    "hello": "23",
    "hello world": "34",
    "Y": "hi /// rtz 77"
}

But the expected output should be as below:

{
    "I_KNO_DR": "456",
    "I_ff_DD": "567",
    "hello": "23",
    "hello world": "34",
    "Y=hi /// rtz": "77"
}

There is the problem in last line of the input and output. How to achieve this correct output. What is the mistake I am doing in current code. Also suggest if there should be some improvement I can do.

Thanks.

2

Answers


  1. I’d change the regular expression to:

    (.+)(?:[ t]*=[ t]*|[ t]+)(.+)
    

    That way you match the last part after = or space (Regex101).

    import re
    
    input_string = """
    I_KNO_DR=456
    I_ff_DD=567
    hello 23
    hello world 34
    Y=hi /// rtz 77"""
    
    out = dict(re.findall(r"(.+)(?:[ t]*=[ t]*|[ t]+)(.+)", input_string))
    print(out)
    

    Prints:

    {
        "I_KNO_DR": "456",
        "I_ff_DD": "567",
        "hello": "23",
        "hello world": "34",
        "Y=hi /// rtz": "77",
    }
    
    Login or Signup to reply.
  2. import json
    import re
    
    filename = "text.txt"
    text = {}
    
    pattern = re.compile('^(.*?)(d+)$')
    
    with open(filename, encoding='utf8') as file:
      for line in file:
        match = pattern.match(line.strip())
        key, value = match.groups()
        text[key] = value
    
    with open("output.json", "w", encoding='utf-8') as output_file:
      json.dump(text, output_file, indent=4, ensure_ascii=False, sort_keys=False)
    

    test screenshot

    Test your text.txt file, if there are any lines where the output is not as expected, please say so!

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search