skip to Main Content

I’m trying to read a parquet folder using the following code:

import pandas as pd
df = pd.read_parquet('PASP0001.parquet')

I’m working on a virtual environment. The code works perfectly if I open a Python session (note the which command):

(.venv) (base) vado@DESKTOP-JROHEGR:~/python-projects/SUSano/pysus$ which python
/home/vado/python-projects/SUSano/.venv/bin/python
(.venv) (base) vado@DESKTOP-JROHEGR:~/python-projects/SUSano/pysus$ python
Python 3.12.3 | packaged by Anaconda, Inc. | (main, May  6 2024, 19:46:43) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.read_parquet('PASP0001.parquet')
>>> df
       PA_CONDIC PA_GESTAO PA_CODUNI PA_DATREF PA_CODPRO  ...    PA_NUMAPA PA_CODOCO PA_CIDPRI PA_CIDSEC PA_MORFOL
0             PB    353950    000779    200001   0201205  ...  00000000000       S04                              
1             PB    353950    000779    200001   0201205  ...  00000000000       S04                              
2             PB    353950    000779    200001   0201205  ...  00000000000       S04                              
3             PB    353950    000779    200001   0201205  ...  00000000000       S04                              
4             PB    353950    000779    200001   0201205  ...  00000000000       S04                              
...          ...       ...       ...       ...       ...  ...          ...       ...       ...       ...       ...
725999        MP    354870    016046    200001   1101138  ...  00011401577       S01      Q610      N180          
726000        MP    354870    016046    200001   1101138  ...  00011401588       S01      N039      N180          
726001        MP    354870    016046    200001   1101138  ...  00011401599       S01      N039      N180          
726002        MP    354870    016046    200001   1101138  ...  00011401600       S01      I10       N180          
726003        MP    354870    016046    200001   1101138  ...  00011401610       S01      N390      N180          

[726004 rows x 24 columns]
>>> 

But when I execute it in VSCode (I’m new to VSCode) I get an error message:

ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
A suitable version of pyarrow or fastparquet is required for parquet support.
 - Missing optional dependency 'pyarrow'. pyarrow is required for parquet support. Use pip or conda to install pyarrow.
 - Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet.

In both cases the same Python interpreter is being used:
enter image description here

I’m working on WSL Ubuntu.

I need help!!

3

Answers


  1. Chosen as BEST ANSWER

    Thanks, Jay and Javad for your kind answers. After trying many of your suggestions I found how to fix it. First of all, I found where VSCode shows the interpreter in use. In fact, it is quite evident. VSCode screen shot

    Then, I discovered that if you start VSCode from a bash terminal using code ., you should do this from the folder that contains the .venv directory. I was doing code . with venv activated, but from a sub-directory.


  2. I have some questions for more info that I hope will help to understand your setup better. But I’ll answer as if all of these are true:

    1. You use Windows, but within VSCode you’ve used ctrl+shift+p to "Connect to WSL" in a new window, so you are running "WSL in Ubuntu", indicated by the blue connection information in the bottom-left corner. My bottom-left corner looks like this.
    2. You are opening a bash shell within VSCode (while running in Ubuntu), activating your venv within that shell, then opening an interactive Python session in that shell in order to produce the demonstration you give above.
    3. Within VSCode, you can confirm in the bottom-right corner of the VSCode application that the application itself sees and is using the same .venv/bin/python interpreter that you used for your interactive session. My bottom-right corner looks like this.

    My biggest suspicion is that your VSCode application isn’t actually using the same interpreter as when you’ve opened a bash shell, activated a venv and run an interactive session there. Bullet 3 above, confirming what the bottom-right corner indicates is the active interpreter, will probably be the most helpful thing to check on.

    My next suspicion is that you could be running VSCode in Windows. Trying to use a WSL-Ubuntu-built venv as the interpreter in VSCode running in Windows has failed for me before. It is possible to be running VSCode on Windows without connecting to a WSL-Ubuntu session, then open a WSL terminal that is mounted at the location of your current working directory in your Windows filesystem. There, you could create a venv and activate it within the WSL-Ubuntu terminal, and run the scripts and interactive session that you want to run. Your VSCode running on Windows, however, won’t be able to see the venv’s Python at that location of the Windows filesystem because it isn’t there. The executable will actually be on your Ubuntu filesystem, and is being referenced by the mounted WSL terminal when needed, but the VSCode application isn’t going to find that executable. I also suspect this could be a factor in your case because in your screenshot, your selected interpreter box is outlined orange. Another quick check you can do: can you navigate to .venv/bin/ and see a python executable there? If all is well, you should see the executable, but if you’re running on Windows and looking for the executable in .venv/bin, and if that .venv was created in a WSL terminal mounted there, you won’t see the executable. When I try to use a linux-built venv’s interpreter in VSCode, this is what my bottom-right corner shows (orange).

    This is an attempt to give you some things to try to make sure VSCode is seeing your WSL-Ubuntu venv’s python executable. I hope this at least points you in a good direction.

    Edit: I see from your answer you’re using a Jupyter notebook, which I hadn’t realized from your original post. Glad to see you’ve gotten it to work!

    Login or Signup to reply.
  3. Even if the interpreter is the same as your environment VSCode might manage its own environment settings separately. So VSCode may or may not have libraries you need because of some kind of discrepancies e.g. environment activation etc.

    Go to Terminal in VSCode activate environment and install missing libraries pyarrow , fastparquet :

    1.

    source /home/vado/python-projects/SUSano/.venv/bin/activate
    
    pip install pyarrow
    
    pip install fastparquet
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search