skip to Main Content

I prepared three little scripts, that theoretically the should do the same, but two not work properly. I’m not sure what could be wrong. I used PyCharm, and packages was installed inside projects, not globally with PIP.

First script don’t give me any results, just "Process finished with exit code 0".

import requests
import bs4

text = "Python"
url = 'https://google.com/search?q=' + text
request_result = requests.get(url)

soup = bs4.BeautifulSoup(request_result.text, "html.parser")
heading_object = soup.find_all('h3')

for info in heading_object:
    print(info.getText())

Second script same as above, only "Process finished with exit code 0".

import requests
import bs4
from urllib.parse import quote_plus

result = 'Python'
query = quote_plus(result)
link = f"https://www.google.com/search?q={query}"

request_result = requests.get(link)
soup = bs4.BeautifulSoup(request_result.text, "html.parser")

for p in soup.find_all('h3'):
    print(p.text)

Third script work fine, I have result from Google search.

import requests
import bs4


url = "https://www.google.com/search"
params = {"q": "Python"}
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}
soup = bs4.BeautifulSoup(requests.get(url, params=params, headers=headers).content, "html.parser")

for a in soup.select("a:has(h3)"):
    print(a["href"])

Can someone explain me please, what is not ok with scripts, that not worked? I asking, because theoretically they should work (they based on tutorial). Maybe exist better way than above to scraping Google results?

2

Answers


  1. I feel like stating the obvious, but the main difference between your scripts is specifying a browser’s header. For instance, your first script with headers:

    import requests
    import bs4
    
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
    }
    text = "Python"
    url = 'https://google.com/search?q=' + text
    request_result = requests.get(url, headers=headers)
    
    soup = bs4.BeautifulSoup(request_result.text, "html.parser")
    heading_object = soup.find_all('h3')
    
    for info in heading_object:
        print(info.getText())
    

    Results:

    Welcome to Python.org
    Downloads
    Python For Beginners
    [...]
    

    Headers are how the browser present itself when knocking on server’s door: server can choose to accept, or deny the request.

    Login or Signup to reply.
  2. it wont work because heading_object is an empty list. There are basically no h3 found.

    so i changed to h2 and then h1 to show it works:

    heading_object = soup.find_all('h1')
    

    this is the code:

    import requests
    import bs4
    
    text = "Python"
    url = 'https://google.com/search?q=' + text
    request_result = requests.get(url)
    
    soup = bs4.BeautifulSoup(request_result.text, "html.parser")
    heading_object = soup.find_all('h1')
    print(heading_object)
    
    for info in heading_object:
        print(info.getText())
    

    this is the result (with the code):

    [<h1>Before you continue to Google</h1>]
    Before you continue to Google
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search