Amazon web services - Not seeing column names when reading csv from s3 in pandas

MuhammadKamil
December 12, 2023
246 views
0 votes
2 Answers

I am using the following bit of code to read the iris dataset from an s3 bucket.

import pandas as pd
import s3fs

s3_path = 's3://h2o-public-test-data/smalldata/iris/iris.csv'

s3 = s3fs.S3FileSystem(anon=True)
with s3.open(s3_path, 'rb') as f:
    df = pd.read_csv(f, header = True)

However, the column names are just the contents of the first row of the dataset. How do I fix that?

Answers

- BhaveshParvatkar
- December 11, 2023 at 6:35 pm
- 0 votes
0
The following changes are required:
1. s3_path should omit the s3://.
2. iris.csv is a file without header. In case you need a file with header then you should go for iris_wheader.csv file.
3. In read_csv header accepts boolean value
Your final code should look something like this
```
import s3fs
import pandas as pd

s3 = s3fs.S3FileSystem(anon=True)

with s3.open('h2o-public-test-data/smalldata/iris/iris_wheader.csv', 'rb') as f:
    df = pd.read_csv(f, header=0)
    print(df.head())
```
Edit: You can directly read the file in pandas as follows:
```
import pandas as pd

df = pd.read_csv('s3://h2o-public-test-data/smalldata/iris/iris_wheader.csv', header=0, storage_options={
    "anon": True
})
print(df.head())
```
You still need to install s3fs. Just that no need to open a file for accessing it.
Login or Signup to reply.

- PeteKirkham
- December 11, 2023 at 6:41 pm
- 0 votes
0
See https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html for all the parameters.

If you don’t have a CSV with the column names, you can use the names parameter to specify the names you want. In that case, you do not need to set header to True.
```
df = pd.read_csv(file_path, names=['yan', 'tan', 'tetherer', 'mether', 'pip'])
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Amazon web services – Not seeing column names when reading csv from s3 in pandas

Answers