I webscraped a site which has an url such as this: https://takipcimerkezi.net/services
I tried to get every information of the table except "aciklama"
This is my code :
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
url='https://takipcimerkezi.net/services'
page= requests.get(url)
table=BeautifulSoup(page.content, 'html.parser')
max_sipariş= table.find_all(attrs={"data-label":"Maksimum Sipariş"})
maxsiparis=[]
for i in max_sipariş:
value=i.text
maxsiparis.append(value)
min_sipariş= table.find_all(attrs={"data-label":"Minimum Sipariş"})
minsiparis=[]
for i in min_sipariş:
value=i.text
minsiparis.append(value)
bin_adet_fiyati= table.find_all(attrs={"data-label":"1000 adet fiyatı "})
binadetfiyat=[]
for i in bin_adet_fiyati:
value=i.text.strip()
binadetfiyat.append(value)
id= table.find_all(attrs={"data-label":"ID"})
idlist=[]
for i in id:
value=i.text
idlist.append(value)
servis= table.find_all(attrs={"data-label":"Servis"})
servislist=[]
for i in servis:
value=i.text
servislist.append(value)
Then i took the values and put them into a excel sheet like this:
But, the last thing i need is, i need to add a new column for which category a row is in.
Eg: Row with the id:"158"
is in the "Önerilen Servisler"
category. Likewise id:"4","1526","1","1494"...
and so on until id:"1537"
this row need to be in " Instagram %100 Gerçek Premium Servisler"
category.
I hope i explained the problem well how can i do such job ?
2
Answers
To add parent category column to the dataframe you can use next example:
Prints:
and saves
data.csv
(screenshot from LibreOffice):EDIT: Little bit explanation of code above:
First I select all data row (rows that don’t contain table header or cells with
colspan=
attribute (the data in this row will become our "Parent" column). This is done with CSS selector"tr:not(:has(td[colspan], th))"
When iterating over these data rows, I need to know what is the "Parent". For this I use
tr.find_previous("td", attrs={"colspan": True})
which will select<td>
with thecolspan=
attribute.I get all text from the
<td>
tags in this row and store it insideall_data
listFrom this list I create a pandas DataFrame
Simply adapt the approach from last post and scrape the categories first to map them while scraping the data:
Example
Output