python request missing part of the content - Artificial Intelligence

AimeeHuang
November 21, 2018
246 views
3 votes
3 Answers

I’m scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the ‘p’ element are returned.I want all the div class=”content” part.

my code :

  import requests
  from bs4 import BeautifulSoup

  payload = {'jobno':'66wee'}
  headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
  r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)
  soup=  BeautifulSoup(r.text, 'html.parser')
  contents = soup.findAll('div',{'class':'content'})  
  desctiprion = contents[0].findAll('p')[0].text.strip()
  print(desctiprion)

result(the job description part is missing):

4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.

but the html code of this part is :

    <div class="content">
      <p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
<br>
<br>Job Description
<br>1. Perform data analysis to help Appier teams to answer business or operational questions.
<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.
<br>3. Conduct data analysis reports to illustrate the results and insight
<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>

Answers

- dataista
- November 21, 2018 at 3:35 am
- 0 votes
0
You are accesing only the first p element with the second [0] indexation:
```
description = contents[0].findAll('p')[0].text.strip()
```
You should iterate through all the p elements:
```
description = ""
for p in contents[0].findAll('p'):
    description += p.text.strip()

print(description)
```
Login or Signup to reply.

import requests
from bs4 import BeautifulSoup

payload = {'jobno': '66wee'}
headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',
                 params=payload, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div', {'class': 'content'})
for content in contents[0].findAll('p')[0].text.splitlines():
    print(content)

- QHarr
- November 21, 2018 at 8:16 am
- 0 votes
0
There is more within the first content class tag but assuming you want just up to the end of point 4 i.e. first child p tag, you can use a descendant combinator with class selector for parent element and element selector for child. Remove the p from the selector if you truly want everything.
```
import requests
from bs4 import BeautifulSoup

url = 'https://www.104.com.tw/job/?jobno=66wee'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select_one('.content p').text
print(s)
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

python request missing part of the content – Artificial Intelligence

Answers