skip to Main Content

I’m trying to read a file with multiple columns. One such column is named ‘answer’, containing values that pretty much are Python dictionaries.

Values include:

{'number': '2', 'date': {'day': '', 'month': '', 'year': ''}, 'spans': []}

and

{'number': '', 'date': {'day': '', 'month': '', 'year': ''}, 'spans': ['4-yard', '31-yard']}

A particular row’s value in the ‘answer’ column is when printed onto the console displayed as

'{'number': '', 'date': {'day': '', 'month': '', 'year': ''}, 'spans': ["Barlow's 1-yard touchdown run", '2-yard touchdown run', 'by rookie RB Joseph Addai.']}'

In the csv, it looks like

{'number': '', 'date': {'day': '', 'month': '', 'year': ''}, 'spans': ["Barlow's 1-yard touchdown run", '2-yard touchdown run', 'by rookie RB Joseph Addai.']}

After conversion to a valid JSON string and printed onto the console, it looks like

'{"number": "", "date": {"day": "", "month": "", "year": ""}, "spans": ["Barlow"s 1-yard touchdown run", "2-yard touchdown run", "by rookie RB Joseph Addai."]}'

To convert the strings to dictionaries, I tried using the json.loads(string) method.

Here is what I did:

for i in range(df.shape[0]):
    dict = df.iloc[i]['answer']
    dict = dict.replace("'", '"')
    # To convert it to a valid JSON string
    dict = json.loads(dict)
    ans[i] = dict['number']

The following error appears for the third example given above, but not the other two:

JSONDecodeError: Expecting ',' delimiter: line 1 column 80 (char 79)

It fails to convert the string into a dictionary for reasons unknown to me.

What can I do to rectify this error?

Is there any method to read the ‘answer’ column as a dictionary, instead of having to read it as a string and then convert said string to a dictionary?

2

Answers


  1. Chosen as BEST ANSWER

    I came to know that the format of the strings while looking like a JSON dictionary, matched perfectly with the format of a Python dictionary and could easily be parsed with the in-built eval() function.

    Using the template of Serge above,

    def convert(string):
        try:
            string = eval(string)
            return string
        except Exception as e:
            print("Error processing string:", string)
            raise e
    
    converted = df['answer'].apply(convert)
    
    print(converted)
    

    Which printed

    0        {'number': '19', 'date': {'day': '', 'month': ...
    1        {'number': '2', 'date': {'day': '', 'month': '...
    2        {'number': '99.9', 'date': {'day': '', 'month'...
    3        {'number': '29826', 'date': {'day': '', 'month...
    4        {'number': '30', 'date': {'day': '', 'month': ...
                                   ...                        
    60852    {'number': '', 'date': {'day': '', 'month': ''...
    60853    {'number': '', 'date': {'day': '', 'month': ''...
    60854    {'number': '', 'date': {'day': '', 'month': ''...
    60855    {'number': '', 'date': {'day': '', 'month': ''...
    60856    {'number': '', 'date': {'day': '', 'month': ''...
    Name: answer, Length: 60857, dtype: object
    

    with absolutely no errors.


  2. As I mentioned in my comment, your error JSONDecodeError: Expecting ',' delimiter: line 1 column 80 (char 79) comes from the 'Barlow"s'. Now, to avoid these errors you’ll need to replace any instances of " occuring between letters. Here is a function that will handle these cases.

    import pandas as pd
    import json
    import re
    
    data = {
        'answer': [
            "{'number': '2', 'date': {'day': '', 'month': '', 'year': ''}, 'spans': []}",
            "{'number': '', 'date': {'day': '', 'month': '', 'year': ''}, 'spans': ['4-yard', '31-yard']}",
            '{'number': '', 'date': {'day': '', 'month': '', 'year': ''' 
            '}, 'spans': ["Barlow's 1-yard touchdown run", "2-yard touchdown run", "by rookie RB Joseph Addai."]}'
        ]
    }
    
    df = pd.DataFrame(data)
    
    def clean_and_convert(string):
        try:
            modified = string.replace("'", '"')
            modified = re.sub(r'(?<!\)"', r'\"', modified) 
            modified = modified.replace('\"', '"') 
            modified = re.sub(r'(w)"(w)', r'1\" 2', modified)
            
            print("Final modified string for JSON loading:", modified)
            
            return json.loads(modified)
        except json.JSONDecodeError as e:
            print("Error processing string:", string)
            print("Modified string that caused error:", modified)
            raise e
    
    df['converted'] = df['answer'].apply(clean_and_convert)
    
    df[['answer', 'converted']]
    

    This will gives you:

                                                 answer  
    0  {'number': '2', 'date': {'day': '', 'month': '...   
    1  {'number': '', 'date': {'day': '', 'month': ''...   
    2  {'number': '', 'date': {'day': '', 'month': ''...   
    
                                               converted  
    0  {'number': '2', 'date': {'day': '', 'month': '...  
    1  {'number': '', 'date': {'day': '', 'month': ''...  
    2  {'number': '', 'date': {'day': '', 'month': ''...  
    

    Here is the print-out of the changes

    Final modified string for JSON loading: {"number": "2", "date": {"day": "", "month": "", "year": ""}, "spans": []}
    Final modified string for JSON loading: {"number": "", "date": {"day": "", "month": "", "year": ""}, "spans": ["4-yard", "31-yard"]}
    Final modified string for JSON loading: {"number": "", "date": {"day": "", "month": "", "year": ""}, "spans": ["Barlow" s 1-yard touchdown run", "2-yard touchdown run", "by rookie RB Joseph Addai."]}
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search