skip to Main Content

I have 100 url and when I click it, it will show json file.
But the json file is a little bit complicated, it looks like this:

{
  "release": [
    {
     "id":"1234",
     "version":"1.0",
     "releaseDate":"2023-07-31",
     "xxx": "ssss",
     "yyy": "uuuu" }
    {
     "id" :"2345",
     "version": "1.1"
     "releaseDate":"2023-05-12"
     "xxx":"sssss"
      .....}
],
"user":false
}

I want to count the release for past 6 month, but the complicated json makes the popular json.loads…pd.read_json…normalize…doesnot work

also the …. actually contains some html label like below, so it will be better to just select the "releaseDate" to filter.

"att":"<p><em>as Alice</em> for.....  

What I tried

I can use this to count the release for all time

releases=len(json_data['releases'])

but how can I limit it to the past 6 month?
any help is really appreciated!!

2

Answers


  1. Create a string that contains the date from six months ago:

    six_months_ago = "2023-02-28"
    

    And then use len() with a list comprehension that only chooses items that were released on or after that date:

    releases = len([r for r in json_data["releases"] if r["releaseDate"] >= six_months_ago])
    
    Login or Signup to reply.
  2. Consider this example:

    import json
    
    json_string = r"""{
      "release": [
        {
         "id":"1234",
         "version":"1.0",
         "releaseDate":"2023-07-31",
         "xxx": "ssss",
         "yyy": "uuuu" },
        {
         "id" :"2345",
         "version": "1.1",
         "releaseDate":"2023-05-12",
         "xxx":"sssss"},
        {
         "id" :"485",
         "version": "1.2",
         "releaseDate":"2022-05-12",
         "xxx":"sssss"}
    ],
    "user":false
    }"""
    
    data = json.loads(json_string)
    
    df = pd.DataFrame(data["release"])
    df["releaseDate"] = pd.to_datetime(df["releaseDate"], dayfirst=False)
    print(df)
    

    Prints:

         id version releaseDate    xxx   yyy
    0  1234     1.0  2023-07-31   ssss  uuuu
    1  2345     1.1  2023-05-12  sssss   NaN
    2   485     1.2  2022-05-12  sssss   NaN
    

    Then to filter this dataframe you can do:

    now_minus_6_months = pd.Timestamp.now() - pd.DateOffset(months=6)
    print(df[df["releaseDate"] > now_minus_6_months])
    

    Prints:

         id version releaseDate    xxx   yyy
    0  1234     1.0  2023-07-31   ssss  uuuu
    1  2345     1.1  2023-05-12  sssss   NaN
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search