skip to Main Content

I have this almost JSON which has something that’s only similar to JSON inside:

TEST_LINE = """Oct 21 22:39:28 GMT [TRACE] (Carlos-288) org.some.awesome.LoggerFramework RID=8e9076-4dd9-ec96-8f35-bde193498f: {
    "service": "MyService",
    "operation": "queryShowSize",
    "requestID": "8e9076-4dd9-ec96-8f35-bde193498f",
    "timestamp": 1634815968000,
    "parameters": [
        {
            "__type": "org.some.awsome.code.service#queryShowSizeRequest",
            "externalID": {
                "__type": "org.some.awsome.code.common#CustomerID",
                "value": "48317"
            },
            "CountryID": {
                "__type": "org.some.awsome.code.common#CountryID",
                "value": "125"
            },
            "operationOriginalDate": 1.63462085667E9,
            "operationType": "MeasureWithToes",
            "measureInstrumentIdentifier": "595909-48d2-6115-85e8-b3aa7b"
        }
    ],
    "output": {
        "__type": "org.some.awsome.code.common#queryShowSizeReply",
        "shoeSize": {
            "value": "$ion_1_0 '[email protected]'::'[email protected]'::{customer_id:"983017317",measureInstrumentIdentifierTilda:"595909-48d2-6115-85e8-b3aa7b",foot_owner:"Oedipus",toe_code:"LR2X10",account_number_token:"1234-2838316-1298470",token_status:VALID,country_code:GRC,measure_store_format:METRIC}"
        }
    }
}
"""

The regex gives me the start of the JSON and I try decoding from there. According to https://jsonlint.com/, it is valid JSON after that point.

So why doesn’t Python’s JSON module decode it? I get this error:

Exception has occurred: JSONDecodeError
Expecting ',' delimiter: line 25 column 156 (char 992)
  File "/Users/decoder/Downloads/json-problem.py", line 44, in read_json
    d = json.loads(line)
        ^^^^^^^^^^^^^^^^
  File "/Users/decoder/Downloads/json-problem.py", line 48, in <module>
    print(read_json(TEST_LINE))
          ^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 25 column 156 (char 992)

Line 25 and character 156 points to the first " in output.shoeSize.value.

But why? That embedded value is only roughly JSON but it should not try to decode it anyway as it is given as a plain string. And the quotes are nicely escaped to not end the string early.

FIND_JSON = re.compile(
    r"w{3} d{2} (d{2}[: ]){3}GMT [[^]]+] ([^)]+) "
    r"org.some.awesome.LoggerFramework RID=[^:]+: "
)

def read_json(line: str) -> str | None:
    if not (m := FIND_JSON.match(line)):
        return None
    line = line[m.end(0) :]
    d = json.loads(line)
    return d


print(read_json(TEST_LINE))

I’ve also tried the raw_decode() but that fails similarly. I don’t understand.

Update 1: To the commenter pointing to a non-escaped double quote, I don’t see it. For me after the colon it follows a backslash and then a double quote. And—again—for me, the linter tells me it’s good. Is there some copy & paste transformation happening on SO?

Update 2: Added the (still missing) code that makes the problem apparent.

2

Answers


  1. The issue you’re encountering is because the embedded string within the "shoeSize" key contains single quotes (‘) which aren’t properly escaped in the JSON. The JSON standard only considers double quotes (") as valid for string delimiters, and your embedded string is mixing both single and double quotes.

    To make the JSON valid and decodable by Python’s json module, you can replace the single quotes with escaped double quotes in the "shoeSize" value. Here’s a modified version of the "shoeSize" key-value pair:

    
    "shoeSize": {
        "value": "$ion_1_0 \"[email protected]\"::\"[email protected]\"::{customer_id:\"983017317\",measureInstrumentIdentifierTilda:\"595909-48d2-6115-85e8-b3aa7b\",foot_owner:\"Oedipus\",toe_code:\"LR2X10\",account_number_token:\"1234-2838316-1298470\",token_status:VALID,country_code:GRC,measure_store_format:METRIC}"
    }
    

    After making these changes, you should be able to decode the JSON using Python’s json module:

    
    import json
    
    # Assuming modified_json contains the modified JSON string
    d = json.loads(modified_json)
    
    
    Login or Signup to reply.
  2. Its not 100% clear how your defining the string but the issue is likely that the escaping of quotes is being processed by python and removed BEFORE its being feed into the json library:

    data ="""{
    "value": "{customer_id:"983017317"..."
    }'
    """
    print(data)  
    

    prints:

    {
    "value": "{customer_id:"983017317"..."
    }'
    

    see the escaping is gone. to have python not process the escaping and have it processed by json you need to declare it as a raw string with r"your_string"
    i.e

    data =r"""{
    "value": "{customer_id:"983017317"..."
    }"""
    print(data)
    

    prints:

    {
    "value": "{customer_id:"983017317"..."
    }
    

    which you can then feed into json.loads() without any issues.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search