Why is my Google search for "Debian" returning an empty ResultSet?

AlejandroRomeroL243pez
May 31, 2023
262 views
1 vote
2 Answers

I’ve been working on Google Colab developing a script to scrape google search results. It has been working for a long time without any problem but now doesn’t. It seems that the code page source its different and the CSS classes I used to use now are diferents.
I use Selenium and BeautifulSoup and the code is the following:

# Installing Selenium after new Ubuntu update
%%shell
cat > /etc/apt/sources.list.d/debian.list <<'EOF'
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
EOF

apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A

apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg

cat > /etc/apt/preferences.d/chromium.pref << 'EOF'
Package: *
Pin: release a=eoan
Pin-Priority: 500


Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300


Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
EOF

apt-get update
apt-get install chromium chromium-driver

pip install selenium

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup 


# Parameters to use Selenium and Chromedriver
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--user-agent="'+userAgent+'"')

#options.headless = True

driver = webdriver.Chrome('chromedriver',options=options)

# Trying to scrape Google Search Results
links = [] 
url = "https://www.google.es/search?q=alergia

driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')

#This doesn't return anything
search = soup.find_all('div', class_='yuRUbf')
for h in search:
  links.append(h.a.get('href'))
print(links)

Why now the class yuRUbf doesnt work for scrape search results? Always worked for me

Trying to scrape href links from google search results using Selenium and BeautifulSoup

Answers

- HedgeHog
- January 21, 2023 at 1:59 pm
- 0 votes
0
There can be different issues, as long as your question is not that specific in this point – So always and first of all, take a look at your soup to see if all the expected ingredients are in place.
- Check if you run into consent banner redirect and handle it with selenium via clicking or sending corresponding headers.
- Classes are highly dynamic things, so change selection strategy and use more static content like id or HTML structure – used css selctors here:
```
soup.select('a:has(h3)')
```
Example:

Cause selenium is not really needed here this is a light version with requests:
```
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get('https://www.google.es/search?q=alergia',headers = {'User-Agent': 'Mozilla/5.0'}, cookies={'CONSENT':'YES+'}).text)
[a.get('href').strip('/url?q=') for a in soup.select('a:has(h3)')]
```
Login or Signup to reply.

To quickly find CSS selectors on a page, you can use SelectorGadget Chrome extension.

As an alternative, you can use Google Search Engine Results API from SerpApi. It’s a paid API with a free plan.

Code example:

from serpapi import GoogleSearch

params = {                           
  "engine": "google",               # serpapi parser engine
  "q": "Coffee",
  "google_domain": "google.com",
  "gl": "us",                       #search query
  "hl": "en",                       #language
  "api_key": "..."                  #serpapi key  https://serpapi.com/manage-api-key
}

search = GoogleSearch(params)
results = search.get_dict()

links = [] 

for result in results.get('organic_results'): 
    links.append(result.get('link')) 
    
print(links)

Please signup or login to give your own answer.

Click here to cancel reply.

Why is my Google search for “Debian” returning an empty ResultSet?

Answers

Example: