skip to Main Content

I webscraped a site which has an url such as this: https://takipcimerkezi.net/services

I tried to get every information of the table except "aciklama"

This is my code :

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

url='https://takipcimerkezi.net/services'
page= requests.get(url)
table=BeautifulSoup(page.content, 'html.parser')

max_sipariş= table.find_all(attrs={"data-label":"Maksimum Sipariş"})
maxsiparis=[]
for i in max_sipariş:
    value=i.text
    
    maxsiparis.append(value)
min_sipariş= table.find_all(attrs={"data-label":"Minimum Sipariş"})
minsiparis=[]
for i in min_sipariş:
    value=i.text
    
minsiparis.append(value)
bin_adet_fiyati= table.find_all(attrs={"data-label":"1000 adet fiyatı "})
binadetfiyat=[]
for i in bin_adet_fiyati:
    value=i.text.strip()
    binadetfiyat.append(value)

id= table.find_all(attrs={"data-label":"ID"})
idlist=[]
for i in id:
    value=i.text
    idlist.append(value)

servis= table.find_all(attrs={"data-label":"Servis"})
servislist=[]
for i in servis:
    value=i.text
    servislist.append(value)
 

Then i took the values and put them into a excel sheet like this:
enter image description here

But, the last thing i need is, i need to add a new column for which category a row is in.

Eg: Row with the id:"158" is in the "Önerilen Servisler" category. Likewise id:"4","1526","1","1494"... and so on until id:"1537" this row need to be in " Instagram %100 Gerçek Premium Servisler" category.

I hope i explained the problem well how can i do such job ?

2

Answers


  1. To add parent category column to the dataframe you can use next example:

    import pandas as pd
    import requests
    from bs4 import BeautifulSoup
    
    
    url = "https://takipcimerkezi.net/services"
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    
    all_data = []
    for tr in soup.select("tr:not(:has(td[colspan], th))"):
        prev = tr.find_previous("td", attrs={"colspan": True})
        tds = [td.get_text(strip=True) for td in tr.select("td")]
        all_data.append([prev.get_text(strip=True), *tds[:5]])
    
    df = pd.DataFrame(
        all_data,
        columns=["Parent", "ID", "Servis", "1000 adet fiyatı", "Minimum Sipariş", "Maksimum Sipariş"],
    )
    print(df.head())
    df.to_csv("data.csv", index=False)
    

    Prints:

                   Parent    ID                                                                                                              Servis 1000 adet fiyatı Minimum Sipariş Maksimum Sipariş
    0  Önerilen Servisler   158      3613-🙂 Instagram Garantili Takipçi | Max 3M | Ömür Boyu Garantili | Düşüş Çok Az | Anlık Başlar | Günde 150K 🔥         13.17 TL             100          3000000
    1  Önerilen Servisler     4  1495-🙂 Instagram Garantili Takipçi | Max 1M | 365 Gün Telafi Garantili | Hızlı Başlar | 30 Gün Telafi Butonu Aktif         12.07 TL              50          5000000
    2  Önerilen Servisler  1526            4513-🙂 Instagram Takipçi | Max 500K | Yabancı Gerçek Kullanıcılar | Düşme Az | Anlık Başlar | Günde 250K         22.28 TL           10000           500000
    3  Önerilen Servisler     1            3033-🙂 Instagram Türk Takipçi | Max 25K | %90 Türk 🇹🇷 | İptal Butonu Aktif | Anlık Başlar | Saatte 1K-2K         21.49 TL              10            25000
    4  Önerilen Servisler  1494         991-🙂 Instagram Çekilişle Takipçi | %100 Organik Türk 🇹🇷 | Max 10K | Günlük İşleme Alınır | Günde 5K Atar !         37.50 TL            1000            10000
    

    and saves data.csv (screenshot from LibreOffice):

    enter image description here


    EDIT: Little bit explanation of code above:

    • First I select all data row (rows that don’t contain table header or cells with colspan= attribute (the data in this row will become our "Parent" column). This is done with CSS selector "tr:not(:has(td[colspan], th))"

    • When iterating over these data rows, I need to know what is the "Parent". For this I use tr.find_previous("td", attrs={"colspan": True}) which will select <td> with the colspan= attribute.

    • I get all text from the <td> tags in this row and store it inside all_data list

    • From this list I create a pandas DataFrame

    Login or Signup to reply.
  2. Simply adapt the approach from last post and scrape the categories first to map them while scraping the data:

    categories = dict((e.get('data-filter-category-id'),e.get('data-filter-category-name')) for e in soup.select('.dropdown-menu a[data-filter-category-name]'))
    

    Example

    from bs4 import BeautifulSoup
    import pandas as pd
    import requests
    
    url='https://takipcimerkezi.net/services'
    
    soup = BeautifulSoup(
            requests.get(
                url,
                cookies={'user_currency':'27d210f1c3ff7fe5d18b5b41f9b8bb351dd29922d175e2a144af68924e3064d1a%3A2%3A%7Bi%3A0%3Bs%3A13%3A%22user_currency%22%3Bi%3A1%3Bs%3A3%3A%22EUR%22%3B%7D;'}
            ).text
           )
    
    categories = dict((e.get('data-filter-category-id'),e.get('data-filter-category-name')) for e in soup.select('.dropdown-menu a[data-filter-category-name]'))
    
    data =  []
    
    for e in soup.select('#service-tbody tr:has([data-label="Minimum Sipariş"])'):
        d = dict(zip(e.find_previous('thead').stripped_strings,e.stripped_strings))
        d['category'] = categories[e.get('data-filter-table-category-id')] if e.get('data-filter-table-category-id') else None
        data.append(d)
     
    pd.DataFrame(data)[['ID',  'category', 'Servis', '1000 adet fiyatı', 'Minimum Sipariş','Maksimum Sipariş']]
    

    Output

    ID category Servis 1000 adet fiyatı Minimum Sipariş Maksimum Sipariş
    0 158 Önerilen Servisler 3613-🙂 Instagram Garantili Takipçi | Max 3M | Ömür Boyu Garantili | Düşüş Çok Az | Anlık Başlar| Günde 150K 🔥 ≈ 0.6573 € 100 3000000
    1 4 Önerilen Servisler 1495-🙂 Instagram Garantili Takipçi | Max 1M | 365 Gün Telafi Garantili | Hızlı Başlar | 30 Gün Telafi Butonu Aktif ≈ 0.6024 € 50 5000000
    1326 1039 Spotify Türk Dinlenme 🇹🇷 1833-⬆️ Spotify Premium Türk Dinlenme | 5K Tek Paket | Normal ≈ 4.9778 € 5000 5000
    1327 1040 Spotify Türk Dinlenme 🇹🇷 1834-⬆️ Spotify Premium Türk Dinlenme | 10K Tek Paket | Normal ≈ 4.9778 € 10000 10000
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search