I need to create a bot that will search for articles on the IEEE Spectrum site after the user enters keywords. The bot must work in Telegram. But when searching for articles, the bot always gives me No results were found for your request.
. Although there are articles on the site, I checked it. Why is the bot not working correctly?
import telegram
from telegram.ext import Updater, CommandHandler
import requests
from bs4 import BeautifulSoup
# a function that will be enabled when a command is received
def start(update, context):
update.message.reply_text(
"Hello! I'll help you find articles on the IEEE Spectrum website."
'Just write /search and the search keywords after that.')
# a function that will turn on when you receive a text message
def search(update, context):
query = " ".join(context.args)
if query == "":
update.message.reply_text('To search, you must enter keywords after the /search command')
return
# the site where we will search for articles
url = 'https://spectrum.ieee.org'
# request a site using keywords
response = requests.get(url+'/search?keywords=' + query)
if response.status_code == 200:
# parsing html page using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# looking for articles on the search results page
articles = soup.select('.search-result')
if len(articles) > 0:
for article in articles:
title = article.select_one('.search-result-title a').text
href = article.select_one('.search-result-title a')['href']
message = f'{title}n{url}{href}'
update.message.reply_text(message)
else:
update.message.reply_text('No results were found for your request.')
else:
update.message.reply_text('Error when requesting IEEE Spectrum site.')
# creating a bot and connecting to the Telegram API
bot_token = '6437672171:AAGVvRu4UNg2eR3ZinB7Ovd0NUk9ctNAVo8'
updater = Updater(token=bot_token, use_context=True)
dispatcher = updater.dispatcher
# adding command and text message handlers
start_handler = CommandHandler('start', start)
search_handler = CommandHandler('search', search)
dispatcher.add_handler(start_handler)
dispatcher.add_handler(search_handler)
# launch a bot
updater.start_polling()
updater.idle()
I tried to do something, but nothing worked
2
Answers
This is modified code
The problem is that the site is using JavaScript.
requests
only works for static web pages, and will not work this site. You can verify this withcurl
:curl -L https://spectrum.ieee.org/search/?q=aerospace
. You can see that the response contains JavaScript, whichrequest
will not work with.Instead, you might want to use a headless web driver with Selenium. Selenium spawns an actual browser instance, so JavaScript will function, and the search results will load.
The general flow of your program should remain the same, and you only need to change out the web-scraping part of your code.
You can learn more about Selenium with its documentation.