Beautiful Soup web scraping how to get all rows from table? - Html

luthierz
March 5, 2023
295 views
1 vote
2 Answers

How can i get all the table from the site there are more in the table but my code only returns 229rows. Here is my code:

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://sosyalkedi.com/services"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

all_data = []
for tr in soup.select("tr:not(:has(td[colspan], th))"):
    prev = tr.find_previous("td", attrs={"colspan": True})
    tds = [td.get_text(strip=True) for td in tr.select("td")]
    all_data.append([prev.get_text(strip=True), *tds[:5]])

df = pd.DataFrame(
    all_data,
    columns=["Parent", "ID", "Servis", "1000 adet fiyatı", "Minimum Sipariş", "Maksimum Sipariş"],
)
print(df.head())

I guess the problem is with getting the html file from the site in the first place. When i inspect, it shows different html code.

Answers

- IoeCmcomc
- March 5, 2023 at 5:34 am
- 0 votes
0
Switch to the lxml parser instead (lxml library is required):
```
soup = BeautifulSoup(requests.get(url).content, "lxml")
```
In this case, the parse tree generated by html.parser is different from the lxml generated tree. You can refer to this table for comparison between supported parsers.
Login or Signup to reply.

- Rozto_
- March 5, 2023 at 7:03 am
- 0 votes
0
You could use pandas.read_html()

It might take a while though – it took me around 2 minutes, but got all 4041 tables.

here is just an example code I used:
```
import pandas as pd
import ssl

ssl._create_default_https_context = ssl._create_unverified_context
tables = pd.read_html('https://sosyalkedi.com/services')
print(len(tables))
print(tables[0].head)
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Beautiful Soup web scraping how to get all rows from table? – Html

Answers