skip to Main Content

How can i get all the table from the site there are more in the table but my code only returns 229rows. Here is my code:

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://sosyalkedi.com/services"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

all_data = []
for tr in soup.select("tr:not(:has(td[colspan], th))"):
    prev = tr.find_previous("td", attrs={"colspan": True})
    tds = [td.get_text(strip=True) for td in tr.select("td")]
    all_data.append([prev.get_text(strip=True), *tds[:5]])

df = pd.DataFrame(
    all_data,
    columns=["Parent", "ID", "Servis", "1000 adet fiyatı", "Minimum Sipariş", "Maksimum Sipariş"],
)
print(df.head())

I guess the problem is with getting the html file from the site in the first place. When i inspect, it shows different html code.

2

Answers


  1. Switch to the lxml parser instead (lxml library is required):

    soup = BeautifulSoup(requests.get(url).content, "lxml")
    

    In this case, the parse tree generated by html.parser is different from the lxml generated tree. You can refer to this table for comparison between supported parsers.

    Login or Signup to reply.
  2. You could use pandas.read_html()

    It might take a while though – it took me around 2 minutes, but got all 4041 tables.

    here is just an example code I used:

    import pandas as pd
    import ssl
    
    ssl._create_default_https_context = ssl._create_unverified_context
    tables = pd.read_html('https://sosyalkedi.com/services')
    print(len(tables))
    print(tables[0].head)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search