I am trying to collect few information from https://www.classicalmusicartists.com/cma/artists.aspx?Artist=&lstCategory=151&selectedArtistId= using python selenium. The details are within a div tag within p tag and the dic tag is activated only when we click on p tag. I am getting information from the first p tag but cannot iterate through the next p tags. Its only selecting the first p tag and not collecting data from others
Also Is it possible to find the number of pages to iterate to the end.
import requests
from bs4 import BeautifulSoup as bs
from seleniumwire import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.select import Select
url = 'https://www.classicalmusicartists.com/cma/artists.aspx'
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(executable_path = '/home/ubuntu/selenium_drivers/chromedriver', options = options)
driver.get(url)
driver.implicitly_wait(2)
dat_html = driver.page_source
category = driver.find_element(By.ID,"ctl00_cphMainContent_lstCategory")
cat=Select(category)
cat.select_by_index(6)
driver.find_element(By.ID, "ctl00_cphMainContent_btnSearch").click()
list_span_elements = driver.find_elements("xpath","//div[@class='artists-by-category']/div/p[@class='expand-heading']")
time.sleep(1)
for x in list_span_elements:
driver.find_element(By.CLASS_NAME, "expand-heading").click()
name = x.find_element("xpath","//p['expand-heading clicked']").text
title = x.find_element("xpath","//div[@class='expand']").text
manager_name = x.find_element("xpath","//div[@class='artist-management-manager']").text
time.sleep(0.5)
country = x.find_element("xpath","//div[@class='artist-management-countries']").text
category = x.find_element("xpath","//div[@class='artist-management-categories']").text
contact_num = x.find_element("xpath","//div[@class='artist-management-telephone']").text
email = x.find_element("xpath","//div[@class='artist-management-email']").text
website = x.find_element("xpath","//div[@class='artist-management-website']").text
print(name, "n",title,"n", manager_name,"n", country[9:],"n", category[10:],"n",
contact_num[3:],"n", email[3:],"n", website[3:])
driver.find_element(By.LINK_TEXT, "Next").click()
2
Answers
Selenium
is not need, also the expanding, cause content is available, just not displayed.Example
Note: For demonstration purpose I sliced categories, simply remove it, to get more results – To iterate also the pages in categories simply adapt the approach.
Example output
Solution using scrapy with more elegant way
The webpage isn’t dynamic meanimg all the required data is in static HTML DOM
I’ve made the pagination in starting url using
range function and for loop
Working code as an example:
Output: