How can i get all the table from the site there are more in the table but my code only returns 229rows. Here is my code:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://sosyalkedi.com/services"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
all_data = []
for tr in soup.select("tr:not(:has(td[colspan], th))"):
prev = tr.find_previous("td", attrs={"colspan": True})
tds = [td.get_text(strip=True) for td in tr.select("td")]
all_data.append([prev.get_text(strip=True), *tds[:5]])
df = pd.DataFrame(
all_data,
columns=["Parent", "ID", "Servis", "1000 adet fiyatı", "Minimum Sipariş", "Maksimum Sipariş"],
)
print(df.head())
I guess the problem is with getting the html file from the site in the first place. When i inspect, it shows different html code.
2
Answers
Switch to the
lxml
parser instead (lxml
library is required):In this case, the parse tree generated by
html.parser
is different from thelxml
generated tree. You can refer to this table for comparison between supported parsers.You could use
pandas.read_html()
It might take a while though – it took me around 2 minutes, but got all 4041 tables.
here is just an example code I used: