I can not read full text with this json file:
{
"messages": [
{
"sender_name": "test",
"timestamp_ms": 1554347140802,
"content": "Chu00c3u00a0o Anh/Chu00e1u00bbu008b, Anh/Chu00e1u00bbu008b vui lu00c3u00b2ng u00c4u0091u00e1u00bbu0083 lu00e1u00bau00a1i Su00e1u00bbu0090 u00c4u0090Iu00e1u00bbu0086N THOu00e1u00bau00a0I + Tu00c3u008cNH TRu00e1u00bau00a0NG Bu00e1u00bbu0086NH u00c4u0091u00e1u00bbu0083 Du00c6u00afu00e1u00bbu00a2C Su00c4u00a8 CHUYu00c3u008aN Mu00c3u0094N su00e1u00bau00afp xu00e1u00bau00bfp tu00c6u00b0 vu00e1u00bau00a5n vu00e1u00bbu0081 su00e1u00bau00a3n phu00e1u00bau00a9m, bu00e1u00bbu0087nh tu00c3u00acnh cu00e1u00bbu00a5 thu00e1u00bbu0083 vu00c3u00a0 liu00e1u00bbu0087u tru00c3u00acnh phu00c3u00b9 hu00e1u00bbu00a3p cho Anh/Chu00e1u00bbu008b nhu00c3u00a9.",
"is_geoblocked_for_viewer": false
},
{
"sender_name": "",
"timestamp_ms": 1554334611125,
"content": "Tu00c3u00b4i muu00e1u00bbu0091n u00c4u0091u00e1u00bau00b7t hu00c3u00a0ng",
"is_geoblocked_for_viewer": false
},
{
"sender_name": "test",
"timestamp_ms": 1554334610788,
"content": "Chu00c3u00a0o Musickhc! Chu00c3u00bang tu00c3u00b4i cu00c3u00b3 thu00e1u00bbu0083 giu00c3u00bap gu00c3u00ac cho bu00e1u00bau00a1n?",
"is_geoblocked_for_viewer": false
},
{
"sender_name": "test",
"timestamp_ms": 1554334609955,
"content": "Customer u00c4u0091u00c3u00a3 tru00e1u00bau00a3 lu00e1u00bbu009di tin nhu00e1u00bau00afn chu00c3u00a0o mu00e1u00bbu00abng tu00e1u00bbu00b1 u00c4u0091u00e1u00bbu0099ng cu00e1u00bbu00a7a bu00e1u00bau00a1n. u00c4u0090u00e1u00bbu0083 thay u00c4u0091u00e1u00bbu0095i hou00e1u00bau00b7c gu00e1u00bbu00a1 lu00e1u00bbu009di chu00c3u00a0o nu00c3u00a0y, hu00c3u00a3y truy cu00e1u00bau00adp phu00e1u00bau00a7n Cu00c3u00a0i u00c4u0091u00e1u00bau00b7t tin nhu00e1u00bau00afn.",
"is_geoblocked_for_viewer": false
}
]
}
I am using this code:
with open('message_1.json', 'r', encoding='utf-8') as file:
data = json.loads(file.read())
print('message', data)
file.close()
The result is
{'messages': [{'sender_name': 'test', 'timestamp_ms': 1554347140802, 'content': 'ChÃxa0o Anh/Chá»x8b, Anh/Chá»x8b vui lòng Äx91á»x83 lại Sá»x90 Äx90Iá»x86N THOáºxa0I + TÃx8cNH TRáºxa0NG Bá»x86NH Äx91á»x83 DƯỢC SĨ CHUYÃx8aN MÃx94N sắp xếp tÆ° vấn vá»x81 sản phẩm, bá»x87nh tình cụ thá»x83 vÃxa0 liá»x87u trình phù hợp cho Anh/Chá»x8b nhé.', 'is_geoblocked_for_viewer': False}, {'sender_name': '', 'timestamp_ms': 1554334611125, 'content': 'Tôi muá»x91n Äx91ặt hÃxa0ng', 'is_geoblocked_for_viewer': False}, {'sender_name': 'test', 'timestamp_ms': 1554334610788, 'content': 'ChÃxa0o Musickhc! Chúng tôi có thá»x83 giúp gì cho bạn?', 'is_geoblocked_for_viewer': False}, {'sender_name': 'test', 'timestamp_ms': 1554334609955, 'content': 'Customer Äx91ã trả lá»x9di tin nhắn chÃxa0o mừng tá»± Äx91á»x99ng của bạn. Äx90á»x83 thay Äx91á»x95i hoặc gỡ lá»x9di chÃxa0o nÃxa0y, hãy truy cáºxadp phần CÃxa0i Äx91ặt tin nhắn.', 'is_geoblocked_for_viewer': False}]}
Can someone help me how to read this file with utf-8 ?
Thanks
2
Answers
I just done it:
But dont know if there are some way better
Unfortunately, whatever generated this JSON file has mangled it by encoding Unicode characters as UTF-8, then encoding them as separate code points in the file.
For example, à should be written as u00e0 directly, but instead it is written as u00c3u00a0.
Your JSON file is broken. You have two options: