skip to Main Content

I have followed the official documentation to set up Apache Spark on my local Windows 11 machine.

This setup includes:

  1. Proper installation of Apache Spark, setting up the env variables etc.
  2. Creation of a virtual env specifically for Python 3.9 to ensure compatibility with PySpark.

Despite these steps, I’m encountering a ShowString error in VS Code:
While I can initiate a Spark session successfully and it starts without errors, I run into problems when trying to use df.show() to display DataFrame contents. The method fails and returns a ShowString error.

Not sure if the current version of java17 spark3.5 are supporting showstring on win11.

But any suggestions are highly appreciated 🙂

[enter image description here](https://i.sstatic.net/2fBDWU3M.png)
[enter image description here](https://i.sstatic.net/3mjFNGlD.png)
[enter image description here](https://i.sstatic.net/gTm3ecIz.png)

I’ve tried multiple debug steps – verifying that the env variables are currently pointed and making sure that the spark sessions starts.

2

Answers


  1. Chosen as BEST ANSWER

    Error Message:

    --------------------------------------------------------------------------- Py4JJavaError                             Traceback (most recent call last)
    Cell In[45], line 2
          1 df = spark.createDataFrame([(1, 'Alice'), (2, 'Bob')], ['id', 'name'])
    ----> 2 df.show()
    
    File c:UsersDocumentspyspark_venvlibsite-packagespysparksqldataframe.py:945, in DataFrame.show(self, n, truncate, vertical)
        885 def show(self, n: int = 20, truncate: Union[bool, int] = True, vertical: bool = False) -> None:
        886     """Prints the first ``n`` rows to the console.
        887 
        888     .. versionadded:: 1.3.0
       (...)
        943     name | Bob
        944     """
    --> 945     print(self._show_string(n, truncate, vertical))
    
    File c:UsersDocumentspyspark_venvlibsite-packagespysparksqldataframe.py:963, in DataFrame._show_string(self, n, truncate, vertical)
        957     raise PySparkTypeError(
        958         error_class="NOT_BOOL",
        959         message_parameters={"arg_name": "vertical", "arg_type": type(vertical).__name__},
        960     )
        962 if isinstance(truncate, bool) and truncate:
    --> 963     return self._jdf.showString(n, 20, vertical)
        964 else:
        965     try:
    ...
        at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:499)
        at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:158)
        at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
        ... 34 more
    

  2. Was looking at the spark UI to debug this and realized that the SparkEnv was basically looking for a python3 executable file

    • creating an alias of python3.exe using python.exe and explicitly specifying in the python path helped.
    import os
    from pyspark.sql import SparkSession
    
    # Define the Python executable path or alias
    python_path = r'C:Users\AppDataLocalProgramsPythonPython39python3.exe'  
    os.environ['PYSPARK_PYTHON'] = python_path
    os.environ['PYSPARK_DRIVER_PYTHON'] = python_path
    
    spark = SparkSession.builder 
        .appName("Your App Name") 
        .config("spark.python.worker.exec", python_path) 
        .getOrCreate()
    
    df = spark.createDataFrame([(1, 'Alice'), (2, 'Bob')], ['id', 'name'])
    df.show()
    
    +---+-----+
    | id| name|
    +---+-----+
    |  1|Alice|
    |  2|  Bob|
    +---+-----+
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search