I am making an AJAX call to an endpoint (I didn’t create this API) where the response is in JSON form. Within the JSON there is an a key called content
of type string
. This content appears to me to be HTML data which contains some JSON inside. I want to be able to parse this JSON which is contained within the HTML data, but I keep getting the following error when I attempt to do a json.loads()
of the string:
{JSONDecodeError}JSONDecodeError('Expecting property name enclosed in double quotes: line 1 column 2 (char 1)')
and I don’t really understand why I am getting this error
Here is the JSON string I am trying to parse:
{"name":"ThreadMainListItemNormalizer","props":{"thread":{"threadId":4369992,"threadTypeId":1,"titleSlug":"sebamed-sale-extra-soft-baby-cream-ps239-anti-dandruff-shampoo-ps387","title":"Sebamed sale - extra soft baby cream £2.39 / anti dandruff shampoo £3.87","currentUserVoteDirection":"","commentCount":0,"status":"Activated","isExpired":false,"isNew":true,"isPinned":false,"isTrending":null,"isBookmarked":false,"isLocal":false,"temperature":0,"temperatureLevel":"","type":"Deal","nsfw":false,"deletedAt":null,"publishedAt":1720003748,"voucherCode":"","link":"https://www.justmylook.com/sebamed-m583","merchant":{"merchantId":45518,"merchantName":"Justmylook","merchantUrlName":"justmylook.co.uk","isMerchantPageEnabled":true},"price":2.39,"nextBestPrice":0,"percentage":0,"discountType":null,"shipping":{"isFree":1,"price":0},"user":{"userId":2701300,"username":"Manish_N","title":"","avatar":{"path":"users/raw/default","name":"2701300_6","slotId":"default","width":0,"height":0,"version":6,"unattached":false,"uid":"2701300_6.raw","ext":"raw"},"persona":{"text":null,"type":null},"isBanned":false,"isDeletedOrPendingDeletion":false,"isUserProfileHidden":false}}}}
If I paste the above JSON string at this online JSON validator tool it says that it is invalid JSON, however, when I unescape the JSON using this tool I get the following output:
"name":"ThreadMainListItemNormalizer","props":{"thread":{"threadId":4369991,"threadTypeId":1,"titleSlug":"samsung-55-qn700c-neo-qled-8k-hdr-smart-tv","title":"Samsung 55" QN700C Neo QLED 8K HDR Smart TV Sold by Reliant Direct FBA","currentUserVoteDirection":"","commentCount":0,"status":"Activated","isExpired":false,"isNew":true,"isPinned":false,"isTrending":null,"isBookmarked":false,"isLocal":false,"temperature":0.59,"temperatureLevel":"Hot1","type":"Deal","nsfw":false,"deletedAt":null,"publishedAt":1720003637,"voucherCode":"","link":"https://www.amazon.co.uk/dp/B0BWFNLPTP?smid=A2CN43WDI0AWCL","merchant":{"merchantId":1650,"merchantName":"Amazon","merchantUrlName":"amazon-uk","isMerchantPageEnabled":true},"price":999,"nextBestPrice":1198,"percentage":0,"discountType":null,"shipping":{"isFree":1,"price":0},"user":{"userId":2679277,"username":"ben.jammin","title":"","avatar":{"path":"users/raw/default","name":"2679277_1","slotId":"default","width":0,"height":0,"version":1,"unattached":false,"uid":"2679277_1.raw","ext":"raw"},"persona":{"text":null,"type":null},"isBanned":false,"isDeletedOrPendingDeletion":false,"isUserProfileHidden":false}}}}
which is in fact valid JSON. My issue then arises, when I try to replicate the unescape tool and try do unescape the string within Python.
I have tried the following solutions
-
Using
ast.literal_eval()
but I get the following error{SyntaxError}SyntaxError('unexpected character after line continuation character', ('<unknown>', 1, 3, '{\"name\":\"ThreadMainListItemNo...:null,\"type\":null},\"isBanned\":false,\"isDeletedOrPendingDeletion\":false,\"isUserProfileHidden\":false}}}}', 1, 0))
-
Using
.encode('raw_unicode_escape').decode('unicode_escape')
method outlined here but after doing ajson.loads()
of the unescaped string I get the following error{JSONDecodeError}JSONDecodeError('Invalid \escape: line 1 column 224 (char 223)')
Here is the full API response as requested. I am interested in the value of the content
key
UPDATE:
I think the issue is that I have some invalid escape characters in the string e.g. £
. I followed the solution here and it’s resolved my issue.
Does anyone have any idea why this API might be including an escaped £
symbol?
2
Answers
just use a plain str.replace – seems good enough in this case. Even if there is one escaped backslash, like in
a\"b
, the replace strategy will still keep one backslash character:Here is one way to handle it:
Result in terminal: