I’m doing an artificial intelligence work on pattern recognition (whether the animals are bats or not)
my classifier is neural network
so my code is like this right now:
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
# Loading data from Names.txt and Wpc50.dat file
data = pd.read_csv("Wpc50.dat", sep='s+', header=None, dtype=float)
species = pd.read_csv("Names.txt", header=None, names=["species"])
# Splitting the data into training and testing, 75% for training and 25% for testing.
data_train, data_test, species_train, species_test = train_test_split(data, species, test_size=0.25, random_state=0)
#Scaling of training data
scaler = StandardScaler()
data_train_normalized = scaler.fit_transform(data_train)
# Applying the same transformations on the test data
data_test_normalized = scaler.transform(data_test)
# Encoding of the Names.txt file, the names are in string and I need to transform them to numerical data
encoder = LabelEncoder()
species_encoded = encoder.fit_transform(species.values.ravel())
species_train, species_test, species_train_encoded, species_test_encoded = train_test_split(species, species_encoded, random_state=0)
species_train_encoded = encoder.transform(species_train.values.ravel())
encoder.handle_unknown = 'ignore'
species_test_encoded = encoder.transform(species_test.values.ravel())
# Creating the neural network
mlp = MLPClassifier(hidden_layer_sizes=(5, 2), max_iter=1000, random_state=0)
# Training the neural network
mlp.fit(data_train_normalized, species_train_encoded)
# Evaluating the model
score = mlp.score(data_test_normalized, species_test_encoded)
print("Accuracy:", score)
anyway, in my terminal no errors appear, but the print() does not work
someone can help me please?
its my homework from college
maybe is the neural network, i’ll try some less complicated
1
Answers
there are couple things you need to fix there. First of all, you are making mistake on preprocessing.
encoder.fit_transform
transforms the data so you dont need to do it againfollowing lanes are unnecerry
secondly, there is a bug on spliting the data. You split it on the top and than you split the labels again and fit not matching values to your network. (It shouldnt be working but idk) I recommend moving the following lane and deleting the other stuff.
So end result would be something like following: