skip to Main Content

When I run the pandas get_dummies() function it returns a keyerror stating that all of my columns are nonexistent. The following code uses copyrighted data and I am citing it: UCI Machine Learning Repository’s adult dataset cited Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

I am unsure what to try.

age, workclass, fnlwgt, education, education-num, marital-status, occupation, forces, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country,
39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty, Wife, Black, Female, 0, 0, 40, Cuba, <=50K
37, Private, 284582, Masters, 14, Married-civ-spouse, Exec-managerial, Wife, White, Female, 0, 0, 40, United-States, <=50K
49, Private, 160187, 9th, 5, Married-spouse-absent, Other-service, Not-in-family, Black, Female, 0, 0, 16, Jamaica, <=50K
52, Self-emp-not-inc, 209642, HS-grad, 9, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 45, United-States, >50K
#import modules
import pandas as pd

#define functions
def open_infile():
    d = pd.read_csv('adult.data.txt', sep = ',')
    return d

def onehot_encode(data):
    data = pd.get_dummies(data, columns = ['workclass', 'education', 'marital-status', 'occupation', 'forces',
                                         'relationship', 'race', 'sex', 'native-country'])
    return data
##########gather data##########
#opoen infile
data = open_infile()
print(len(data))

##########process data##########
#one-hot encode categorical columns
onehot_encode(data)
print(data.head())
Traceback (most recent call last):
  File "C:/Users/Hezekiah/PycharmProjects/Artificial Intelligence 0/Chapter 1 Application Adult.py", line 20, in <module>
    onehot_encode(data)
  File "C:/Users/Hezekiah/PycharmProjects/Artificial Intelligence 0/Chapter 1 Application Adult.py", line 11, in onehot_encode
    'relationship', 'race', 'sex', 'native-country'])
  File "C:UsersHezekiahPycharmProjectsArtificial Intelligence 0venvlibsite-packagespandascorereshapereshape.py", line 812, in get_dummies
    data_to_encode = data[columns]
  File "C:UsersHezekiahPycharmProjectsArtificial Intelligence 0venvlibsite-packagespandascoreframe.py", line 2934, in __getitem__
    raise_missing=True)
  File "C:UsersHezekiahPycharmProjectsArtificial Intelligence 0venvlibsite-packagespandascoreindexing.py", line 1354, in _convert_to_indexer
    return self._get_listlike_indexer(obj, axis, **kwargs)[1]
  File "C:UsersHezekiahPycharmProjectsArtificial Intelligence 0venvlibsite-packagespandascoreindexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "C:UsersHezekiahPycharmProjectsArtificial Intelligence 0venvlibsite-packagespandascoreindexing.py", line 1246, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: "None of [Index(['workclass', 'education', 'marital-status', 'occupation', 'forces',n       'relationship', 'race', 'sex', 'native-country'],n      dtype='object')] are in the [columns]"

I expect pandas get_dummies() function to convert all categorical attributes into numerical ones, but instead pycharm is returning a keyerror that tells me that none of my columns exist, when clearly they do.

2

Answers


  1. There is problem with trailing spaces in columns names, solution is use str.strip :

    data.columns = data.columns.str.strip()
    

    Or list comprehension with strip:

    data.columns = [x.strip() for x in data.columns]
    
    Login or Signup to reply.
  2. your main problem is your data while merging adult.names with adult.data file
    There is no forces columns in website data you mentioned. if you merge data correctly you will not get this error too.

    Even you are using this column for making dummies too.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search