skip to Main Content

I need to create a bot that will search for articles on the IEEE Spectrum site after the user enters keywords. The bot must work in Telegram. But when searching for articles, the bot always gives me No results were found for your request.. Although there are articles on the site, I checked it. Why is the bot not working correctly?

import telegram
from telegram.ext import Updater, CommandHandler
import requests
from bs4 import BeautifulSoup

# a function that will be enabled when a command is received
def start(update, context):
    update.message.reply_text(
        "Hello! I'll help you find articles on the IEEE Spectrum website."
        'Just write /search and the search keywords after that.')

# a function that will turn on when you receive a text message
def search(update, context):
    query = " ".join(context.args)
    if query == "":
        update.message.reply_text('To search, you must enter keywords after the /search command')
        return

    # the site where we will search for articles
    url = 'https://spectrum.ieee.org'
    # request a site using keywords
    response = requests.get(url+'/search?keywords=' + query)

    if response.status_code == 200:
        # parsing html page using BeautifulSoup
        soup = BeautifulSoup(response.content, 'html.parser')

        # looking for articles on the search results page
        articles = soup.select('.search-result')
        if len(articles) > 0:
            for article in articles:
                title = article.select_one('.search-result-title a').text
                href = article.select_one('.search-result-title a')['href']
                message = f'{title}n{url}{href}'
                update.message.reply_text(message)
        else:
            update.message.reply_text('No results were found for your request.')
    else:
        update.message.reply_text('Error when requesting IEEE Spectrum site.')

# creating a bot and connecting to the Telegram API
bot_token = '6437672171:AAGVvRu4UNg2eR3ZinB7Ovd0NUk9ctNAVo8'
updater = Updater(token=bot_token, use_context=True)
dispatcher = updater.dispatcher

# adding command and text message handlers
start_handler = CommandHandler('start', start)
search_handler = CommandHandler('search', search)
dispatcher.add_handler(start_handler)
dispatcher.add_handler(search_handler)

# launch a bot
updater.start_polling()
updater.idle()

I tried to do something, but nothing worked

2

Answers


  1. Chosen as BEST ANSWER
    import telegram
    from telegram.ext import Updater, CommandHandler
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup
    
    
    # a function that will be enabled when a command is received
    def start(update, context):
        update.message.reply_text(
            "Hello! I'll help you find articles on the IEEE Spectrum website."
            'Just write /search and the search keywords after that.')
    
    
    # a function that will turn on when you receive a text message
    def search(update, context):
        query = " ".join(context.args)
        if query == "":
            update.message.reply_text('To search, you must enter keywords after the /search command')
            return
    
        # the site where we will search for articles
        url = 'https://spectrum.ieee.org'
    
        # set up the web driver
        options = Options()
        options.add_argument('--headless')
        driver = webdriver.Chrome(options=options)
        driver.get(url + '/search?keywords=' + query)
    
        # get the page source after the JavaScript has loaded
        html = driver.page_source
        driver.quit()
    
        # parsing html page using BeautifulSoup
        soup = BeautifulSoup(html, 'html.parser')
    
        # looking for articles on the search results page
        articles = soup.select('.search-result')
        if len(articles) > 0:
            for article in articles:
                title = article.select_one('.search-result-title a').text
                href = article.select_one('.search-result-title a')['href']
                message = f'{title}n{url}{href}'
                context.bot.send_message(chat_id=update.effective_chat.id, text=message)
        else:
            update.message.reply_text('No results were found for your request.')
    
    
    # creating a bot and connecting to the Telegram API
    bot_token = '6437672171:AAGVvRu4UNg2eR3ZinB7Ovd0NUk9ctNAVo8'
    updater = Updater(token=bot_token, use_context=True)
    dispatcher = updater.dispatcher
    
    # adding command and text message handlers
    start_handler = CommandHandler('start', start)
    search_handler = CommandHandler('search', search)
    dispatcher.add_handler(start_handler)
    dispatcher.add_handler(search_handler)
    
    # launch a bot
    updater.start_polling()
    updater.idle()
    

    This is modified code


  2. The problem is that the site is using JavaScript. requests only works for static web pages, and will not work this site. You can verify this with curl: curl -L https://spectrum.ieee.org/search/?q=aerospace. You can see that the response contains JavaScript, which request will not work with.

    Instead, you might want to use a headless web driver with Selenium. Selenium spawns an actual browser instance, so JavaScript will function, and the search results will load.

    The general flow of your program should remain the same, and you only need to change out the web-scraping part of your code.

    You can learn more about Selenium with its documentation.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search