I am using python to convert the text input file to json.
My Code:
import json
import re
filename = "text.txt"
text = {}
pattern = re.compile(r's*([^=t]+)s*=s*(.*)')
with open(filename, encoding='utf8') as file:
for line in file:
match = pattern.match(line.strip())
if match:
key, value = match.groups()
text[key] = value
else:
key_value = line.strip().rsplit(maxsplit=1)
if len(key_value) == 2:
key, value = key_value
text[key] = value
with open("output.json", "w", encoding='utf-8') as output_file:
json.dump(text, output_file, indent=4, ensure_ascii=False, sort_keys=False)
I am using regular expression for this operation. I am giving below as input:
I_KNO_DR=456
I_ff_DD=567
hello 23
hello world 34
Y=hi /// rtz 77
Now current output is as follows:
{
"I_KNO_DR": "456",
"I_ff_DD": "567",
"hello": "23",
"hello world": "34",
"Y": "hi /// rtz 77"
}
But the expected output should be as below:
{
"I_KNO_DR": "456",
"I_ff_DD": "567",
"hello": "23",
"hello world": "34",
"Y=hi /// rtz": "77"
}
There is the problem in last line of the input and output. How to achieve this correct output. What is the mistake I am doing in current code. Also suggest if there should be some improvement I can do.
Thanks.
2
Answers
I’d change the regular expression to:
That way you match the last part after
=
or space (Regex101).Prints:
Test your text.txt file, if there are any lines where the output is not as expected, please say so!