Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Convert the text input file to JSON in Python

harry
December 13, 2023
176 views
0 votes
2 Answers

I am using python to convert the text input file to json.

My Code:

import json
import re

filename = "text.txt"
text = {}

pattern = re.compile(r's*([^=t]+)s*=s*(.*)')

with open(filename, encoding='utf8') as file:
    for line in file:
        match = pattern.match(line.strip())
        if match:
            key, value = match.groups()
            text[key] = value
        else:
            key_value = line.strip().rsplit(maxsplit=1)
            if len(key_value) == 2:
                key, value = key_value
                text[key] = value

with open("output.json", "w", encoding='utf-8') as output_file:
    json.dump(text, output_file, indent=4, ensure_ascii=False, sort_keys=False)

I am using regular expression for this operation. I am giving below as input:

I_KNO_DR=456
I_ff_DD=567
hello 23
hello world 34
Y=hi /// rtz 77

Now current output is as follows:

{
    "I_KNO_DR": "456",
    "I_ff_DD": "567",
    "hello": "23",
    "hello world": "34",
    "Y": "hi /// rtz 77"
}

But the expected output should be as below:

{
    "I_KNO_DR": "456",
    "I_ff_DD": "567",
    "hello": "23",
    "hello world": "34",
    "Y=hi /// rtz": "77"
}

There is the problem in last line of the input and output. How to achieve this correct output. What is the mistake I am doing in current code. Also suggest if there should be some improvement I can do.

Thanks.

Answers

- AndrejKesely
- December 12, 2023 at 10:36 pm
- 0 votes
0
I’d change the regular expression to:
```
(.+)(?:[ t]*=[ t]*|[ t]+)(.+)
```
That way you match the last part after = or space (Regex101).
```
import re

input_string = """
I_KNO_DR=456
I_ff_DD=567
hello 23
hello world 34
Y=hi /// rtz 77"""

out = dict(re.findall(r"(.+)(?:[ t]*=[ t]*|[ t]+)(.+)", input_string))
print(out)
```
Prints:
```
{
    "I_KNO_DR": "456",
    "I_ff_DD": "567",
    "hello": "23",
    "hello world": "34",
    "Y=hi /// rtz": "77",
}
```
Login or Signup to reply.

- YiYang
- December 13, 2023 at 1:17 am
- 0 votes
0
```
import json
import re

filename = "text.txt"
text = {}

pattern = re.compile('^(.*?)(d+)$')

with open(filename, encoding='utf8') as file:
  for line in file:
    match = pattern.match(line.strip())
    key, value = match.groups()
    text[key] = value

with open("output.json", "w", encoding='utf-8') as output_file:
  json.dump(text, output_file, indent=4, ensure_ascii=False, sort_keys=False)
```
Test your text.txt file, if there are any lines where the output is not as expected, please say so!
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.