I’m scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the ‘p’ element are returned.I want all the div class=”content” part.
my code :
import requests
from bs4 import BeautifulSoup
payload = {'jobno':'66wee'}
headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)
soup= BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div',{'class':'content'})
desctiprion = contents[0].findAll('p')[0].text.strip()
print(desctiprion)
result(the job description part is missing):
4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
but the html code of this part is :
<div class="content">
<p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
<br>
<br>Job Description
<br>1. Perform data analysis to help Appier teams to answer business or operational questions.
<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.
<br>3. Conduct data analysis reports to illustrate the results and insight
<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>
3
Answers
You are accesing only the first
p
element with the second[0]
indexation:You should iterate through all the
p
elements:There is more within the first
content
class tag but assuming you want just up to the end of point 4 i.e. first childp
tag, you can use a descendant combinator with class selector for parent element and element selector for child. Remove thep
from the selector if you truly want everything.