I prepared three little scripts, that theoretically the should do the same, but two not work properly. I’m not sure what could be wrong. I used PyCharm, and packages was installed inside projects, not globally with PIP.
First script don’t give me any results, just "Process finished with exit code 0".
import requests
import bs4
text = "Python"
url = 'https://google.com/search?q=' + text
request_result = requests.get(url)
soup = bs4.BeautifulSoup(request_result.text, "html.parser")
heading_object = soup.find_all('h3')
for info in heading_object:
print(info.getText())
Second script same as above, only "Process finished with exit code 0".
import requests
import bs4
from urllib.parse import quote_plus
result = 'Python'
query = quote_plus(result)
link = f"https://www.google.com/search?q={query}"
request_result = requests.get(link)
soup = bs4.BeautifulSoup(request_result.text, "html.parser")
for p in soup.find_all('h3'):
print(p.text)
Third script work fine, I have result from Google search.
import requests
import bs4
url = "https://www.google.com/search"
params = {"q": "Python"}
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}
soup = bs4.BeautifulSoup(requests.get(url, params=params, headers=headers).content, "html.parser")
for a in soup.select("a:has(h3)"):
print(a["href"])
Can someone explain me please, what is not ok with scripts, that not worked? I asking, because theoretically they should work (they based on tutorial). Maybe exist better way than above to scraping Google results?
2
Answers
I feel like stating the obvious, but the main difference between your scripts is specifying a browser’s header. For instance, your first script with headers:
Results:
Headers are how the browser present itself when knocking on server’s door: server can choose to accept, or deny the request.
it wont work because
heading_object
is an empty list. There are basically noh3
found.so i changed to
h2
and thenh1
to show it works:this is the code:
this is the result (with the code):