Html - extract value from span

slabbe
December 1, 2023
78 views
0 votes
2 Answers

I want to extract a snow depth value from a weather site to a dataframe. (https://www.yr.no/nb/sn%C3%B8dybder/NO-46/Norge/Vestland) Specifically the snow depth for the Jordalen – Nåsen area.
Screen shot

The closest I’ve gotten is printing all the values using this code:

import pandas as pd
import requests 
from bs4 import BeautifulSoup 

r=requests.get('https://www.yr.no/nb/sn%C3%B8dybder/NO-46/Norge/Vestland')
soup = BeautifulSoup(r.content, 'html.parser') 

result=soup.find_all("span", {"class": "snow-depth__value"})

print(result)

But, i’ve been unsuccessful in figuring a way to transfer this specific value to a pandas dataframe.

Answers

- ThisGuyCantEven
- December 1, 2023 at 5:22 pm
- 0 votes
0
This worked for me in bs4, I think the actual parameter in find_all is called class_ due to class being special reserved word in python:
```
from bs4 import BeautifulSoup
from bs4.element import ResultSet
import requests
from requests.models import Response
from typing import Generator

response: Response = requests.get('https://www.yr.no/nb/sn%C3%B8dybder/NO-46/Norge/Vestland')
html: str = response.content
soup: BeautifulSoup BeautifulSoup = BeautifulSoup(html,'html.parser')
spans: ResultSet = soup.find_all('span',class_='snow-depth__value')
depths: Generator[int,None,None] = (int(span.text) for span in spans)
```
The use of a Generator comprehension here (by using () instead of []) means that the int(span.text) will be lazily evaluated when pandas needs to actually iterate through the values while initializing the DF.

You can write it to a DataFrame like this:
```
from pandas import DataFrame
df: DataFrame = pd.DataFrame(depths,columns=['Show'])
```
UPDATE:

I think it’s worth mentioning that this will flatten all of the snow depths in that table into a 1D structure when in reality they form sort of an Nx3 2D array where N is the number of rows.
Login or Signup to reply.

- kabr8
- December 1, 2023 at 5:24 pm
- 0 votes
0
You can use the string variable to find the inner content of a HTML node. See: here

Like this:
```
result=[]
for i in soup.find_all("span", {"class": "snow-depth__value"}):
    result.append(i.string)

# or inline

result = [i.string for i in soup.find_all("span", {"class": "snow-depth__value"})]
```
With this you have a list you can than write into a dataframe. See here
```
df=pd.DataFrame(result,columns=['Show'])
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Html – extract value from span

Answers