Html - How to select a specific child tag from a parent tag and scrape data from it?

NolA
July 25, 2024
184 views
0 votes
2 Answers

There exist several of these <div> tags where class="b-card" in the HTML file which which I extracted the following HTML code:

    <div class="b-card">
     <div class="builder-exp-wrap">
      <a class="no-ajaxy img-wrap js-rc-link" data-href="/puravankara-limited-100046">
       <img alt="Images for Logo of Puravankara" src="https://im.proptiger.com/3/100046/13/puravankara-4491843.jpeg?width=155&amp;height=50"/>
      </a>
      <div class="builder-details-wrap">
       <a class="no-ajaxy builder-name put-ellipsis js-b-card" data-builderid="100046" href="/puravankara-limited-100046" target="_blank">
        Puravankara Limited
       </a>
      </div>
     </div>
     <div class="b-dtls">
      <div class="count-wrap one">
       <div class="circle">
        <div class="val">
         99
        </div>
       </div>
       <div class="lbl">
        Total Projects
       </div>
      </div>
      <div class="count-wrap">
       <div class="circle">
        <div class="val">
         36
        </div>
       </div>
       <div class="lbl">
        Ongoing Projects
       </div>
      </div>
     </div>

Under it, I want to scrape the text of the div tags where class="val". As shown below, I can iteratively scrape the whole block of div tag where class="b-card" using the find_all() method. And within it, I can also scrape the text under the div tag where class="builder-details-wrap" due there being a single a tag as a child. But if I want to scrape the data under the div tag where class="count-wrap", I am not sure how to proceed. Under this parent tag, there are two child div tags and I’m unsure how to select the one where class="circle" from where I eventually need to go down to the class div tag where class="val" to scrape its text.

from bs4 import BeautifulSoup 
import requests

main_url="https://www.proptiger.com/bangalore/all-builders?page=1"
main_url_html=BeautifulSoup(requests.get(main_url).text,"html.parser")

for bcard in main_url_html.find_all('div',class_='b-card'):
    bcard_CompanyName=bcard.find('div',class_='builder-details-wrap')
    bcard_CompanyName=bcard_CompanyName.a.text

    bcard_OngoingProjs=bcard.find('div',class_='count-wrap')
    bcard_OngoingProjs=bcard_OngoingProjs.div.div.text

Any help would be greatly appreciated.

Answers

i prefer using select over find but that’s of course a personal choice

with this code

for bcard in main_url_html.select('div.b-card'):
    bcard_CompanyName=bcard.select_one('div.builder-details-wrap a').text
    print(bcard_CompanyName)

    for project_stat in bcard.select('div.count-wrap'):
        lbl = project_stat.select_one('.lbl').text.strip()
        val = project_stat.select_one('.val').text.strip()
        print(lbl, val)

you’re getting for the first company

Mahindra Lifespaces Developers
Total Projects 145
Ongoing Projects 70

You can use a CSS selector.

Inside your loop:

bcard_OngoingProjs = bcard.select_one('div.count-wrap:nth-of-type(2) div.circle div.val')
bcard_OngoingProjs = bcard_OngoingProjs.text.strip()

Full code with modifications:

from bs4 import BeautifulSoup 
import requests

main_url = "https://www.proptiger.com/bangalore/all-builders?page=1"
main_url_html = BeautifulSoup(requests.get(main_url).text, "html.parser")

for bcard in main_url_html.find_all('div', class_='b-card'):
    bcard_CompanyName = bcard.find('div', class_='builder-details-wrap')
    bcard_CompanyName = bcard_CompanyName.a.text.strip()

    bcard_OngoingProjs = bcard.select_one('div.count-wrap:nth-of-type(2) div.circle div.val')
    bcard_OngoingProjs = bcard_OngoingProjs.text.strip()
    #do something with bcard_OngoingProjs and bcard_CompanyName variables ...

Please signup or login to give your own answer.

Click here to cancel reply.

Html – How to select a specific child tag from a parent tag and scrape data from it?

Answers