skip to Main Content

I want to extract a snow depth value from a weather site to a dataframe. (https://www.yr.no/nb/sn%C3%B8dybder/NO-46/Norge/Vestland) Specifically the snow depth for the Jordalen – Nåsen area.
Screen shot

The closest I’ve gotten is printing all the values using this code:

import pandas as pd
import requests 
from bs4 import BeautifulSoup 

r=requests.get('https://www.yr.no/nb/sn%C3%B8dybder/NO-46/Norge/Vestland')
soup = BeautifulSoup(r.content, 'html.parser') 

result=soup.find_all("span", {"class": "snow-depth__value"})

print(result)

But, i’ve been unsuccessful in figuring a way to transfer this specific value to a pandas dataframe.

2

Answers


  1. This worked for me in bs4, I think the actual parameter in find_all is called class_ due to class being special reserved word in python:

    from bs4 import BeautifulSoup
    from bs4.element import ResultSet
    import requests
    from requests.models import Response
    from typing import Generator
    
    response: Response = requests.get('https://www.yr.no/nb/sn%C3%B8dybder/NO-46/Norge/Vestland')
    html: str = response.content
    soup: BeautifulSoup BeautifulSoup = BeautifulSoup(html,'html.parser')
    spans: ResultSet = soup.find_all('span',class_='snow-depth__value')
    depths: Generator[int,None,None] = (int(span.text) for span in spans)
    

    The use of a Generator comprehension here (by using () instead of []) means that the int(span.text) will be lazily evaluated when pandas needs to actually iterate through the values while initializing the DF.

    You can write it to a DataFrame like this:

    from pandas import DataFrame
    df: DataFrame = pd.DataFrame(depths,columns=['Show'])
    

    UPDATE:

    I think it’s worth mentioning that this will flatten all of the snow depths in that table into a 1D structure when in reality they form sort of an Nx3 2D array where N is the number of rows.

    Login or Signup to reply.
  2. You can use the string variable to find the inner content of a HTML node. See: here

    Like this:

    result=[]
    for i in soup.find_all("span", {"class": "snow-depth__value"}):
        result.append(i.string)
    
    # or inline
    
    result = [i.string for i in soup.find_all("span", {"class": "snow-depth__value"})]
    

    With this you have a list you can than write into a dataframe. See here

    df=pd.DataFrame(result,columns=['Show'])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search