My crawling code doesn't print any results - Telegram API

Yellowgreen
March 29, 2020
124 views
1 vote
2 Answers

I’m trying to make a crawler for a Korean news website.
The weird thing is I have working code already. Following is the example.

import requests
from bs4 import BeautifulSoup
import telegram

url = 'http://www.thelec.kr/news/articleList.html?page=1&total=3836&box_idxno=&view_type=sm'
req = requests.get(url)
html = req.text
soup = BeautifulSoup(html, 'html.parser')

search_result = soup.select_one('#user-container')
news_list = search_result.select('.article-veiw-body > .article-list > .article-list-content > .list-block > .list-titles >a')

contents = []
for news in news_list:
    link = news['href']
    title = news.text
    contents.append("http://www.thelec.kr"+link + " " + title)

contents

I changed just the url and tag, like this:

import requests
from bs4 import BeautifulSoup
import telegram

url = 'https://news.daum.net/breakingnews/digital'
req = requests.get(url)
html = req.text
soup = BeautifulSoup(html, 'html.parser')

search_result = soup.select_one('#kakaoContent')
news_list = search_result.select('.box_etc > .cMain > .mArticle > .box_etc > .list_news2 > .cont_thumb > a')

links = []
for news in news_list:
    link = news['href']
    links.append(link)

links

All of a sudden, the result is ‘[]’. Empty. I tried it on another website too, but same result, empty.
I dont’t understand. Both look just same. Why does one work, and another one doesn’t work?

Answers

- AlexHall
- March 29, 2020 at 8:56 pm
- 0 votes
0
Your selector is too narrow. Try:
```
soup.select('#kakaoContent .box_etc .list_news2 .cont_thumb a')
```
Login or Signup to reply.

- QHarr
- March 29, 2020 at 8:57 pm
- 0 votes
0
You current second selector doesn’t work on the page for me. If you want to get the links to articles on the left hand side you need to change your css selector. For example, to the faster and accurate
```
.list_news2 .tit_thumb >  a
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

My crawling code doesn't print any results – Telegram API

Answers