skip to Main Content

I am on intel dev cloud and using Intel OneAPI. This is my code till now:

# first block of jupyter notebook
import modin.pandas as pd

# second block of jupyter notebook
df = pd.read_csv('dataset/dataset.csv')
df.head()
# output of second block

UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:

    import ray
    ray.init()

2023-09-01 12:00:16,471 INFO worker.py:1636 -- Started a local Ray instance.

The first block is running properly but, when I am reading my dataset, it is giving me this warning and server unavailable error.

enter image description here

If I use import pandas as pd, the code is running fine, but modin.pandas is not working. My dataset is ~ 1 GB csv file. Why is this happening???

How to Reproduce this?

System Information

  • OS – Linux 90-Ubuntu 5.4.0-80-generic
  • CPU – Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz
  • RAM – 188 GB

2

Answers


  1. Step 1: Check if you have installed modin properly. If you are unsure, try to reinstall the relevant modin dependencies etc.

    pip install “modin[all]” # dependencies + modin execution engines
    

    Step 2: import modin.pandas as pd

    df = pd.read_csv('dataset.csv') #Please avoid placing your .csv file under a folder.
    

    Let’s see what will happen.

    Reference:
    The pandas library offers user-friendly data structures, including DataFrames, for data analysis. However, it may perform slowly with extensive datasets (e.g., 100 GB or 1 TB) since it wasn’t optimized for such large volumes. Fortunately, the Modin library addresses this by allowing you to scale pandas workflows with just one code change.

    Login or Signup to reply.
  2. Answered in Intel DevCloud support, please take a look

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search