skip to Main Content

I’m trying to convert the Json type data to dataframe. I extrated data from following webpage with following code.

webpage I extract data

import pandas as pd
import requests
import json

url = r'https://www.hkex.com.hk/eng/csm/MonthlyStat/data_tab_monthly_202302e.js?_=1'

response = requests.get(url)
if response.status_code != 200:
    exit(1)

text = response.text[response.text.index('=') + 1:]
data = json.loads(text)

However, I come across some problem when I trying to conver the Json to dataframe.
I have tried using following code, but the result seem didn’t change when I change the max_level.

df = pd.json_normalize(data, max_level=2)

It seems different from the JSON I met before. Too many data in one json, I only need style 1 data from all id. I am unsure if I using the right code. I’m the beginner in Json. Thanks.

2

Answers


  1. Chosen as BEST ANSWER

    Actually, I found what I want is to extract the data from JSON. And I finally found the method to take the data from JSON. Here is my code:

    import pandas as pd
    import requests
    import json
    
    url = r'https://www.hkex.com.hk/eng/csm/MonthlyStat/data_tab_monthly_202302e.js?_=1'
    
    response = requests.get(url)
    if response.status_code != 200:
        exit(1)
    
    text = response.text[response.text.index('=') + 1:]
    data = json.loads(text)
    
    df = pd.json_normalize(data, max_level=2)
    length = len(data[0]['content'][0]['table']['tr'][1:])
    SSENorthbound_df = pd.DataFrame([data[0]['content'][0]['table']['tr'][1:][x].get('td')[0] for x in range(length)])
    
    df = pd.json_normalize(data, max_level=2)
    length = len(data[1]['content'][0]['table']['tr'][1:])
    SSESouthbound_df = pd.DataFrame([data[1]['content'][0]['table']['tr'][1:][x].get('td')[0] for x in range(length)])
    
    df = pd.json_normalize(data, max_level=2)
    length = len(data[2]['content'][0]['table']['tr'][1:])
    SZSENorthbound_df = pd.DataFrame([data[2]['content'][0]['table']['tr'][1:][x].get('td')[0] for x in range(length)])
    
    df = pd.json_normalize(data, max_level=2)
    length = len(data[3]['content'][0]['table']['tr'][1:])
    SZSE_Southbound_df = pd.DataFrame([data[3]['content'][0]['table']['tr'][1:][x].get('td')[0] for x in range(length)])
    

  2. import pandas as pd
    import requests
    import json
    
    url = r'https://www.hkex.com.hk/eng/csm/MonthlyStat/data_tab_monthly_202302e.js?_=1'
    
    response = requests.get(url)
    if response.status_code != 200:
        exit(1)
    
    text = response.text[response.text.index('=') + 1:]
    data = json.loads(text)
    

    Here your data is of type dict as the json.loads function converts json to the python native datatype.

    json.normalize will normalize your dict value if there is nesting in it. In your case it is stopping at a point where there is a list value for a key.

    In this api response, the content value is a list which cannot be normalized as it is a list.

    You can read more about this here.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search