Converting Json to pd

PatrickChau
March 20, 2023
232 views
1 vote
2 Answers

I’m trying to convert the Json type data to dataframe. I extrated data from following webpage with following code.

import pandas as pd
import requests
import json

url = r'https://www.hkex.com.hk/eng/csm/MonthlyStat/data_tab_monthly_202302e.js?_=1'

response = requests.get(url)
if response.status_code != 200:
    exit(1)

text = response.text[response.text.index('=') + 1:]
data = json.loads(text)

However, I come across some problem when I trying to conver the Json to dataframe.
I have tried using following code, but the result seem didn’t change when I change the max_level.

df = pd.json_normalize(data, max_level=2)

It seems different from the JSON I met before. Too many data in one json, I only need style 1 data from all id. I am unsure if I using the right code. I’m the beginner in Json. Thanks.

Answers

Chosen as BEST ANSWER

Actually, I found what I want is to extract the data from JSON. And I finally found the method to take the data from JSON. Here is my code:

import pandas as pd
import requests
import json

url = r'https://www.hkex.com.hk/eng/csm/MonthlyStat/data_tab_monthly_202302e.js?_=1'

response = requests.get(url)
if response.status_code != 200:
    exit(1)

text = response.text[response.text.index('=') + 1:]
data = json.loads(text)

df = pd.json_normalize(data, max_level=2)
length = len(data[0]['content'][0]['table']['tr'][1:])
SSENorthbound_df = pd.DataFrame([data[0]['content'][0]['table']['tr'][1:][x].get('td')[0] for x in range(length)])

df = pd.json_normalize(data, max_level=2)
length = len(data[1]['content'][0]['table']['tr'][1:])
SSESouthbound_df = pd.DataFrame([data[1]['content'][0]['table']['tr'][1:][x].get('td')[0] for x in range(length)])

df = pd.json_normalize(data, max_level=2)
length = len(data[2]['content'][0]['table']['tr'][1:])
SZSENorthbound_df = pd.DataFrame([data[2]['content'][0]['table']['tr'][1:][x].get('td')[0] for x in range(length)])

df = pd.json_normalize(data, max_level=2)
length = len(data[3]['content'][0]['table']['tr'][1:])
SZSE_Southbound_df = pd.DataFrame([data[3]['content'][0]['table']['tr'][1:][x].get('td')[0] for x in range(length)])

(Edit)

- NikhilSharma
- March 20, 2023 at 9:55 am
- 0 votes
0
```
import pandas as pd
import requests
import json

url = r'https://www.hkex.com.hk/eng/csm/MonthlyStat/data_tab_monthly_202302e.js?_=1'

response = requests.get(url)
if response.status_code != 200:
    exit(1)

text = response.text[response.text.index('=') + 1:]
data = json.loads(text)
```
Here your data is of type dict as the json.loads function converts json to the python native datatype.

json.normalize will normalize your dict value if there is nesting in it. In your case it is stopping at a point where there is a list value for a key.

In this api response, the content value is a list which cannot be normalized as it is a list.

You can read more about this here.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.