Ubuntu - Intel Server Unavailable after executing the code

AdarshWase
October 6, 2023
316 views
0 votes
2 Answers

I am on intel dev cloud and using Intel OneAPI. This is my code till now:

# first block of jupyter notebook
import modin.pandas as pd

# second block of jupyter notebook
df = pd.read_csv('dataset/dataset.csv')
df.head()

# output of second block

UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:

    import ray
    ray.init()

2023-09-01 12:00:16,471 INFO worker.py:1636 -- Started a local Ray instance.

The first block is running properly but, when I am reading my dataset, it is giving me this warning and server unavailable error.

If I use import pandas as pd, the code is running fine, but modin.pandas is not working. My dataset is ~ 1 GB csv file. Why is this happening???

How to Reproduce this?

Step 1 – https://devcloud.intel.com/oneapi/
Step 2 – click on Getting
Started tab.
Step 3 – Go down and Click on launch JupyterLab. (It is
like Google Colab or Kaggle Notebooks)
Step 4 – Create ipynb and use
wget to download this data.
Data – !wget
https://s3-ap-southeast-1.amazonaws.com/he-public-data/datasetab75fb3.zip

System Information

OS – Linux 90-Ubuntu 5.4.0-80-generic
CPU – Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz
RAM – 188 GB

Answers

- xPetersue
- September 1, 2023 at 9:28 pm
- 0 votes
0
Step 1: Check if you have installed modin properly. If you are unsure, try to reinstall the relevant modin dependencies etc.
```
pip install “modin[all]” # dependencies + modin execution engines
```
Step 2: import modin.pandas as pd
```
df = pd.read_csv('dataset.csv') #Please avoid placing your .csv file under a folder.
```
Let’s see what will happen.

Reference:
The pandas library offers user-friendly data structures, including DataFrames, for data analysis. However, it may perform slowly with extensive datasets (e.g., 100 GB or 1 TB) since it wasn’t optimized for such large volumes. Fortunately, the Modin library addresses this by allowing you to scale pandas workflows with just one code change.
Login or Signup to reply.