skip to Main Content

Hi all,
I am scraping questions on Amazon using the following code:

url = "https://www.amazon.com/ask/questions/asin/B0000CFLYJ/1/ref=ask_ql_psf_ql_hza?isAnswered=true"

r = requests.get("http://localhost:8050/render.html", params = {'url': url, 'wait': 3})
soup = BeautifulSoup(r.text, 'html.parser')
questions = soup.find_all('div', {'class':'a-fixed-left-grid-col a-col-right'})
print(questions)

question_list = []
for item in questions:
    question = item.find('a',{'class':'a-link-normal'}).text.strip()
    question_list.append(question)

But I keep getting the following error:

AttributeError: 'NoneType' object has no attribute 'text'

Do I need some sort of exception handler? Or should I extract the text question using a different element all together? I’ve tried using the class below it which is a span element but to no avail:

<div class="a-fixed-left-grid-col a-col-right" style="padding-left:0%;float:left;">
<a class="a-link-normal" href="/ask/questions/Tx150GKDGF6FGAY/ref=ask_ql_ql_al_hza">
<span class="a-declarative" data-action="ask-no-op" data-ask-no-op='{"metricName":"top-question-text-click"}' data-csa-c-func-deps="aui-da-ask-no-op" data-csa-c-id="bsypsr-tzr1os-ttv9h7-td9hn6" data-csa-c-type="widget">
                
                  
                  
                    It comes in already made in a spray bottle, but yet says it's concentrated and gives dilution instructions? 

So do I use it as is, or dilute?
                  
                
              </span>

I’m just trying to scrape the first page of questions before looping through the other pages. Any help would be greatly appreciated!

REVISION/ BASED OFF @Unmitigated Response Below (will scrape multiple pages of questions). Thanks!:

question_list = []

# Define some functions for the scrape
def get_soup(url):
    # Send to render using Splash
    r = requests.get("http://localhost:8050/render.html", params = {'url': url, 'wait': 3})
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

def get_questions(soup):
    for item in soup.select('.askTeaserQuestions > div'):
        question = item.find('a', {'class':'a-link-normal'}).getText(strip=True)
        question_list.append(question)

# Loop through pages and call functions from above... 10 reviews per page
for x in range(1,6):
    soup = get_soup(f'https://www.amazon.com/ask/questions/asin/B0000CFLYJ/{x}')
    get_questions(soup)
    print(len(question_list))
    
    # When we find disabled last/last page element... then stop looping through pages
    if not soup.find('li',{'class':'a-disabled a-last'}):
        pass
    else:
        break

        
# Last step is to use pandas to export data to a excel
df = pd.DataFrame(question_list)
df.to_excel('SimpleGreen_Amazon_Questions_22oz_1pk_diff_seller.xlsx', index = False)
print('Web Scrape and Export of Questions completed successfully!')

2

Answers


  1. You can try to set User-Agent HTTP header so the server correctly responds:

    import re
    import requests
    from bs4 import BeautifulSoup
    
    url = 'https://www.amazon.com/ask/questions/asin/B0000CFLYJ/1/ref=ask_ql_psf_ql_hza?isAnswered=true'
    
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0'}
    soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
    
    for s in soup.select('span:-soup-contains("Question:")'):
        question = re.sub(r's{2,}', ' ', s.find_next('a').text.strip())
        print(question)
    

    Prints:

    This has to be diluted? Then why put it in a spray bottle? Gives the impression you can spray it on directly.
    can you use on fabric car convertable tops?
    It comes in already made in a spray bottle, but yet says it's concentrated and gives dilution instructions? So do I use it as is, or dilute?
    Is this safe to use on my car's exterior paint to remove grease stains? Thanks!!
    It it good for removing grease stains from clothes?
    I oversprayed simple green cleaning my steering wheel and have not found a way to remove the dried product from my windshield. Any suggestions?
    can this be used on new white kitchen cabinets?
    Can you use it on vinyl records?
    Can this be used Stainless steel sink?
    Can this be used to effectively clean and disinfect a soft cat carrier? How to use?
    
    Login or Signup to reply.
  2. You can select the first link in all the direct children of the element with class askTeaserQuestions.

    for item in soup.select('.askTeaserQuestions > div'):
        question = item.find('a', {'class':'a-link-normal'}).getText(strip=True)
        question_list.append(question)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search