skip to Main Content

Hear me out: Quite a newbie with Python. I totally may have messed up somewhere in this.

Here’s the error message in full:

Traceback (most recent call last):
  File "webScrapingTool.py", line 1, in <module>
    from selenium import webdriver
ModuleNotFoundError: No module named 'selenium'

I wrote the code on Ubuntu 22.04, whose default Python version is 3.10.4. I have a dual-boot system. I had not realized that I apparently(?) needed to make a Windows executable directly in Windows, so I moved the file to there and tried. I downloaded Python for Windows, whose version is 3.12.2. As far as I understand, this is possibly part one of the problem.

Keep in mind that I have tried both ‘pyinstaller’ and ‘auto-py-to-exe’ on Ubuntu and I tried ‘pyinstaller’ on Windows too. When I create the executable in Windows, it will show the error message as above.

As mentioned, I am almost brand new to Python and I did a pretty basic project, but I really need to know what the issue is with finally making my file executable/usable for the average person.

This is my code:

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time
import re
import requests
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import StaleElementReferenceException
from requests.exceptions import RequestException, Timeout, HTTPError, ConnectionError

filename = "data"
link = input("Please enter the Google Maps link for scraping: ")

browser = webdriver.Chrome()
record = []
e = []
le = 0

def Selenium_extractor():
    action = ActionChains(browser)
    prev_length = 0
    a = browser.find_elements(By.CLASS_NAME, "hfpxzc")

    while len(a) < 1000:
        print(len(a))
        var = len(a)
        last_element = a[-1]
        action.move_to_element(last_element).perform()
        browser.execute_script("arguments[0].scrollIntoView();", last_element)
        time.sleep(2)
        a = browser.find_elements(By.CLASS_NAME, "hfpxzc")

        try:
            if len(a) == var:
                le += 1
                if le > 20 or len(a) == prev_length:
                    break
            else:
                le = 0
            prev_length = len(a)
        except StaleElementReferenceException:
            continue


    names_processed = False  # Flag to indicate if names are processed

    for i in range(len(a)):
        if names_processed:
            break  # If names are processed, break out of the loop
        action.move_to_element(a[i]).perform()
        time.sleep(2)
        source = browser.page_source
        soup = BeautifulSoup(source, 'html.parser')
        try:
            Item_Html = soup.findAll('div', {"class": "lI9IFe"})
            for item_html in Item_Html:
                Name_Html = item_html.find('div', {"class": "qBF1Pd fontHeadlineSmall"})
                name = Name_Html.text.strip()
                if name not in e:
                    e.append(name)
                    divs = item_html.findAll('div', {"class": "W4Efsd"})
                    email_scraped = False

                    for div in divs:
                        phone_span = div.find('span', {"class": "UsdlK"})
                        if phone_span and phone_span.text.strip().startswith("+"):  # Check condition
                            phone = phone_span.text.strip()
                        else:
                            phone = "Not available"
                    Address_Html = divs[2]
                    address_text = Address_Html.get_text().split(' · ')
                    if len(address_text) > 1:
                        address = address_text[1].strip()
                    else:
                        address = "Not available"
                    if not email_scraped:
                        Website_Html = item_html.find('a', {"class": "lcr4fd S9kvJb"})
                        for j in range(len(divs)):
                            if Website_Html:
                                website = Website_Html.get('href')
                                try:
                                    website_source = requests.get(website, timeout=10).text
                                    emails = re.findall(r'b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}b', website_source)
                                    emails = [email for email in emails if not email.endswith('.wixpress.com')]
                                    emails = list(set(emails))
                                    if not emails:
                                        emails = "Not available"
                                    else:
                                        email_scraped = True
                                except (Timeout, ConnectionError) as ex:
                                    print("Error scraping emails from website due to network issues:", ex)
                                except HTTPError as ex:
                                    print("HTTP error occurred while accessing the website:", ex)
                                except RequestException as ex:
                                    print("An error occurred while accessing the website:", ex)
                            else:
                                website = "Not available"
                                emails = "Not available"
                    
                    print([name, phone, address, website, emails])
                    record.append([name, phone, address, website, emails])
            names_processed = True  # Set flag to indicate names are processed
        except Exception as ex:
            print("Error occurred:", ex)
            continue

    print(record)
    return record

browser.get(str(link))
time.sleep(10)
Selenium_extractor()

df=pd.DataFrame(record,columns=['Business Name', 'Phone', 'Street Address', 'Website', 'Email Addresses'])  # writing data to the file
df.to_csv(filename + '.csv',index=False,encoding='utf-8')

When I try to do "pip install cx_freeze" or to install the ‘requirements.txt’ with pip, I get the error message like this:

https://pastebin.com/KheA21nM

The leads that I can give with this are that I’m almost certain that it is associated with where the program was written versus where the executable is being made (two different Python versions). I have seen a few pages that mentioned ‘hiddenimports’ in the .spec file, but no luck from what some suggested. Hopefully somebody knows exactly what I mean because while there are similar questions here, none of them are exactly like my situation here. Please let me know of anything that I can do to fix this. Thanks!

2

Answers


  1. Description:

    Inside the error focus on this line:

    error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": [url here - Reddit doesn't like links, so I removed it)
    

    This indicates that you have to install or upgrade your previous version of Microsoft Visual C++. As C++ plays a vital role in converting your python file into a windows executable.

    Secondly, if you familiar with manual library importing in pyinstaller, perform that action for the libraries that can’t be installed by pyinstaller.

    Login or Signup to reply.
  2. you have to include project module before converting it

    like:

    pyinstaller.exe --onefile --paths=D:envLibsite-packages  .foo.py
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search