Visual Studio Code - Pyspark - Unable to display the DataFrame contents using df.show() on Windows 11

Rpilli
May 19, 2024
67 views
0 votes
2 Answers

I have followed the official documentation to set up Apache Spark on my local Windows 11 machine.

This setup includes:

Proper installation of Apache Spark, setting up the env variables etc.
Creation of a virtual env specifically for Python 3.9 to ensure compatibility with PySpark.

Despite these steps, I’m encountering a ShowString error in VS Code:
While I can initiate a Spark session successfully and it starts without errors, I run into problems when trying to use df.show() to display DataFrame contents. The method fails and returns a ShowString error.

Not sure if the current version of java17 spark3.5 are supporting showstring on win11.

But any suggestions are highly appreciated 🙂

[enter image description here](https://i.sstatic.net/2fBDWU3M.png)
[enter image description here](https://i.sstatic.net/3mjFNGlD.png)
[enter image description here](https://i.sstatic.net/gTm3ecIz.png)

I’ve tried multiple debug steps – verifying that the env variables are currently pointed and making sure that the spark sessions starts.

Answers

Chosen as BEST ANSWER

Error Message:

--------------------------------------------------------------------------- Py4JJavaError                             Traceback (most recent call last)
Cell In[45], line 2
      1 df = spark.createDataFrame([(1, 'Alice'), (2, 'Bob')], ['id', 'name'])
----> 2 df.show()

File c:UsersDocumentspyspark_venvlibsite-packagespysparksqldataframe.py:945, in DataFrame.show(self, n, truncate, vertical)
    885 def show(self, n: int = 20, truncate: Union[bool, int] = True, vertical: bool = False) -> None:
    886     """Prints the first ``n`` rows to the console.
    887 
    888     .. versionadded:: 1.3.0
   (...)
    943     name | Bob
    944     """
--> 945     print(self._show_string(n, truncate, vertical))

File c:UsersDocumentspyspark_venvlibsite-packagespysparksqldataframe.py:963, in DataFrame._show_string(self, n, truncate, vertical)
    957     raise PySparkTypeError(
    958         error_class="NOT_BOOL",
    959         message_parameters={"arg_name": "vertical", "arg_type": type(vertical).__name__},
    960     )
    962 if isinstance(truncate, bool) and truncate:
--> 963     return self._jdf.showString(n, 20, vertical)
    964 else:
    965     try:
...
    at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:499)
    at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:158)
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
    ... 34 more

(Edit)

Was looking at the spark UI to debug this and realized that the SparkEnv was basically looking for a python3 executable file

creating an alias of python3.exe using python.exe and explicitly specifying in the python path helped.

import os
from pyspark.sql import SparkSession

# Define the Python executable path or alias
python_path = r'C:Users\AppDataLocalProgramsPythonPython39python3.exe'  
os.environ['PYSPARK_PYTHON'] = python_path
os.environ['PYSPARK_DRIVER_PYTHON'] = python_path

spark = SparkSession.builder 
    .appName("Your App Name") 
    .config("spark.python.worker.exec", python_path) 
    .getOrCreate()

df = spark.createDataFrame([(1, 'Alice'), (2, 'Bob')], ['id', 'name'])
df.show()

+---+-----+
| id| name|
+---+-----+
|  1|Alice|
|  2|  Bob|
+---+-----+

Please signup or login to give your own answer.

Click here to cancel reply.

Visual Studio Code – Pyspark – Unable to display the DataFrame contents using df.show() on Windows 11

Answers