This is part of the json file I have got as an output after running running a python script using the telethon API.
[{"_": "Message", "id": 4589, "to_id": {"_": "PeerChannel", "channel_id": 1399858792}, "date": "2020-09-03T14:51:03+00:00", "message": "Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same", "out": false, "mentioned": false, "media_unread": false, "silent": false, "post": false, "from_scheduled": false, "legacy": false, "edit_hide": false, "from_id": 356886523, "fwd_from": null, "via_bot_id": null, "reply_to_msg_id": null, "media": null, "reply_markup": null, "entities": [], "views": null, "edit_date": null, "post_author": null, "grouped_id": null, "restriction_reason": []}, {"_": "MessageService", "id": 4588, "to_id": {"_": "PeerChannel", "channel_id": 1399858792}, "date": "2020-09-03T11:48:18+00:00", "action": {"_": "MessageActionChatJoinedByLink", "inviter_id": 310378430}, "out": false, "mentioned": false, "media_unread": false, "silent": false, "post": false, "legacy": false, "from_id": 1264437394, "reply_to_msg_id": null}
As you can see, the python script has scraped the chats from a particular channel in telegram. All I need is to store the date and message section of the json into a separate dataframe so that I can apply appropriate filters and give a proper output. Can anyone help me with this?
2
Answers
I think you should use json loads then json_normalize to convert json to dataframe with max_level for nested dictionary.
'[{...}, {...}]'
.data = json.loads(data)
, first.'date'
and corresponding'message'
can be extracted from thelist
ofdicts
with a list-comprehension.dict
in thelist
, and usedict.get
for thekey
. If the key doesn’t exist,None
is returned.Alternatively
'message'
isNone