skip to Main Content

I am trying to webscrape this site https://bulkfollows.com/services
What I want is to get every service row which has features like this: 'ID', 'Service', 'Rate per 1000', 'Min / Max', 'Refill','Avg. Time','Description','category' I got everything except category column a category column is a parent feature which is like these :

" YouTube - Watch Time By Length" or "Instagram - Followers [ From  ✓VERIFIED ACCOUNTS]"

This is my code :

from bs4 import BeautifulSoup
import pandas as pd
import requests

url="https://bulkfollows.com/services"
soup = BeautifulSoup(requests.get(url).content, "lxml") 

categories = dict((e.get('data-filter-category-id'),e.get('data-filter-category-name')) for e in soup.select('.dropdown-menu button[data-filter-category-name]'))

data= []
for e in soup.select("#serviceList tr:has(td)"):    
    d = dict(zip(e.find_previous('thead').stripped_strings,e.stripped_strings))
    d['category'] = categories[e.get('data-filter-table-category-id')] if e.get('data-filter-table-category-id') else None
    data.append(d)

pd.DataFrame(data)[['ID',  'Service', 'Rate per 1000', 'Min / Max', 'Refill','Avg. Time','Description','category']]


I need some help in the for loop for getting parent columns
this is my output : enter image description here

I want the category column not none and the description when you click for example in first service I want it to be:

Link: https://youtube.com/video Start: 0-12hrs Speed: 100-200 Per day Refill: 30 days

Please Note: Watch time will take 1-3 days to update on analytics.
After 3 days of delivery, if the watch time does not update, please
take a screenshot of your video analytic ( Not the Monetization page,
we don’t guarantee Monetization ) and upload it to prntscr.com and
send it us the uploaded screenshot ).

2

Answers


  1. The reason your Category column only has None values is because the elements that soup.select("#serviceList tr:has(td)") finds do NOT have the css attribute data-filter-table-category-id. The elements its finding are like this:

    <tr class="">
     <td class="service-id">
      7365
     </td>
     <td class="service-name">
      YouTube - Subscribers ~ Max 120k ~ 𝗥𝗘𝗙𝗜𝗟𝗟 30D ~ 500-2k/days ~ [ 𝔅𝗲𝙨𝘁 - 𝐒𝐩𝐞𝐞𝐝, 𝐐𝐮�                 𝐥𝐢𝐭𝐲  ]
     </td>
     <td class="service-rate">
      $4.80
     </td>
     <td class="service-min-max">
      100 / 120000
     </td>
     <td class="">
      <span class="badge gurantee">
       Refill 30 days
      </span>
     </td>
     <td class="average-time ser-id-7365">
      63 hours 40 minutes
     </td>
     <td class="text-right service-description">
      <a class="btn btn-sm btn-info" data-target="#description-7365" data-toggle="modal" href="javascript:void(0);">
       <i class="mdi mdi-information">
       </i>
       Details
      </a>
      <!-- Modal -->
      <div aria-hidden="true" aria-labelledby="description7365Label" class="modal fade text-left" id="description-7365" role="dialog" tabindex="-1">
       <div class="modal-dialog" role="document">
        <div class="modal-content">
         <div class="modal-header">
          <h5 class="modal-title" id="description7365Label">
           YouTube - Subscribers ~ Max 120k ~ 𝗥𝗘𝗙𝗜𝗟𝗟 30D ~ 500-2k/days ~ [ 𝔅𝗲𝙨𝘁 - 𝐒𝐩𝐞𝐞𝐝,                𝐐𝐮𝐚𝐥𝐢𝐭𝐲 ]'s Description
          </h5>
          <button aria-label="Close" class="close" data-dismiss="modal" type="button">
           <span aria-hidden="true">
            ×
           </span>
          </button>
         </div>
         <div class="modal-body">
          <p style="line-height: 20px;">
           Link: https://www.youtube.com/channel/UCYhvmzYNxCAGBaMhnsk69kg
           <br/>
           Start: Instant - 0 hrs
           <br/>
           Speed: 500-2k/day
           <br/>
           Refill: 30 days
           <br/>
           <br/>
           Drop: 0- 5% drop.
          </p>
         </div>
         <div class="modal-footer">
          <button class="btn btn-primary" data-dismiss="modal" type="button">
           <i class="mdi mdi-close">
           </i>
           Close
          </button>
         </div>
        </div>
       </div>
      </div>
     </td>
    </tr>
    

    From what I have deciphered from your post, you want to create a table similar to the ones on bulkfollows.com except for 3 main differences:

    1. Your table will be the aggregate of the tables on the website

    2. Your table will contain an additional column–Category–(which will contain the Service category IDs???)

    3. Your table’s Description column will contain the text hidden behind the purple Details buttons.

    Yourself or someone else can figure out the precise solution to your problem; I will merely point you in the right direction.

    General Approach:

    First collect of the HTML elements that make up the individual tables. These are the div elements with the classes col-lg-12 mb-3 ser-row.

    tables = soup.select('div.col-lg-12.mb-3.ser-row')
    

    Secondly iterate over the list of elements.

    Then in each iteration:

    1. use the same logic in your code. That is, create a dictionary with the current table’s column names and values as the keys and values, respectively.

    2. Get the value of the css attribute data-filter-table-category-id. Create a new key, Category, and assign the css attr’s value to it.

    3. Combine the dict’s into a DataFrame (as you did in your code).

    Login or Signup to reply.
  2. There is no one fits all approach for scraping – So you have to select your elements more specific, may check the docs for some finding strategies.

    Replace the line:

    d['category'] = categories[e.get('data-filter-table-category-id')] if e.get('data-filter-table-category-id') else None
    

    with following, that will take a look to previous <h4> to grab the Category and to the next modal to get the Description:

    d['Category'] = e.find_previous('h4').get_text(strip=True)
    d['Description'] = e.find('div',{'class':'modal-body'}).get_text(' ',strip=True)
    
    Example
    from bs4 import BeautifulSoup
    import pandas as pd
    import requests
    
    url="https://bulkfollows.com/services"
    soup = BeautifulSoup(requests.get(url).content, "lxml") 
    
    categories = dict((e.get('data-filter-category-id'),e.get('data-filter-category-name')) for e in soup.select('.dropdown-menu button[data-filter-category-name]'))
    
    data= []
    for e in soup.select("#serviceList tr:has(td)"):    
        d = dict(zip(e.find_previous('thead').stripped_strings,e.stripped_strings))
        d['Category'] = e.find_previous('h4').get_text(strip=True)
        d['Description'] = e.find('div',{'class':'modal-body'}).get_text(' ',strip=True)
        data.append(d)
    
    pd.DataFrame(data)[['ID',  'Service', 'Rate per 1000', 'Min / Max', 'Refill','Avg. Time','Description','Category']]
    
    Output
    ID Service Rate per 1000 Min / Max Refill Avg. Time Description Category
    0 7365 YouTube – Subscribers ~ Max 120k ~ 𝗥𝗘𝗙𝗜𝗟𝗟 30D ~ 500-2k/days ~ [ 𝔅𝗲𝙨𝘁 – 𝐒𝐩𝐞𝐞𝐝, 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 ] $4.80 100 / 120000 Refill 30 days 59 hours 53 minutes Link: https://www.youtube.com/channel/UCYhvmzYNxCAGBaMhnsk69kg Start: Instant – 0 hrs Speed: 500-2k/day Refill: 30 days Drop: 0- 5% drop. ❖ Bulkfollows High Demand Services
    1 7363 Spotify – 𝐅𝐑𝐄𝐄 Plays ~ 𝐋𝐢𝐟𝐞𝐓𝐢𝐦𝐞 ~ 10k-50k/days ~ USA/Russian ~ [ 𝔅𝗲𝙨𝘁 – 𝐒𝐩𝐞𝐞𝐝, 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 ] $0.188 1000 / 100000000 Refill Lifetime 22 hours 26 minutes Link: https://open.spotify.com/track/40Zb4FZ6nS1Hj8RVfaLkCV Start: Instant ( Avg 0-3 hrs ) Speed: 10k to 20k days Refill: Lifetime Quality: Plays from Bot Created free accounts. Make sure you know the risk of adding of bot plays Drop: Spotify Plays are stable, do not drop. Delivery Time: It will take 2-5 days to update plays. If it’s delivery 10k in 1 day, then this 10k will take 2-5 days to update, the next 10k plays will take the next 2-5 days, and so on. ❖ Bulkfollows High Demand Services
    3973 7613 Australia Traffic from Instagram $0.025 100 / 1000000 No Refill Not enough data 💡 Use a bit.ly link to track traffic ✅ 100% Real & Unique Visitors ✅ Google Analytics Supported 🕒 Session Length: 40-60 Seconds per visit ⬇️ Bounce Rates: Low ⚡️ Speed: 10,000 unique visitors per day 🏁 Start Time: 0-12h (we check all links for compliance) 🖥️ Desktop Traffic Over 90% 📱 Mobile Traffic Under 10% ⚠️ No Adult, Drug or offensive websites allowed 🔗 Link Format: Enter Full Website URL ⚊ 🇦🇺 Website Traffic from Australia [ + Choose Referrer ]
    3974 7614 Australia Traffic from Wikipedia $0.025 100 / 1000000 No Refill Not enough data 💡 Use a bit.ly link to track traffic ✅ 100% Real & Unique Visitors ✅ Google Analytics Supported 🕒 Session Length: 40-60 Seconds per visit ⬇️ Bounce Rates: Low ⚡️ Speed: 10,000 unique visitors per day 🏁 Start Time: 0-12h (we check all links for compliance) 🖥️ Desktop Traffic Over 90% 📱 Mobile Traffic Under 10% ⚠️ No Adult, Drug or offensive websites allowed 🔗 Link Format: Enter Full Website URL ⚊ 🇦🇺 Website Traffic from Australia [ + Choose Referrer ]
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search