Computer Science‎ > ‎

Python for Data; (11) MLP Classifier in Python (Multi Layer Perceptron)


Multi-Layer Perceptron Classifier

    Let's try to solve a Kaggle Problem "Poker Rule Induction".

#import libraries
import pandas as pd

Getting the data

sampleSub = pd.read_csv("sampleSubmission.csv")
test = pd.read_csv('test.csv')
train = pd.read_csv('train.csv')
    Making our feature and prediction data seperate.
x = train.drop('hand',axis=1)
y = train['hand']
    
test.head()
x.head()
y.head()
    0    0    
    1    0
    2    2
    3    3
    4    0

Train-Test Splitting

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)

    Definning the Model with (200,150,100) hidden layer node sizes and activation function as "relu"
from sklearn.neural_network import MLPClassifier
classifier = MLPClassifier(hidden_layer_sizes=(200,150,100),activation='relu')

Fitting the training data to the model

classifier.fit(X_train,y_train)
default parameters looks like:
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(200, 150, 100), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

Prediction

pred = classifier.predict(X_test)

Model Evaluation

from sklearn.metrics import classification_report,confusion_matrix
classifier.score(X_test, y_test)

    0.92502998800479808
92% Accuracy is not that bad for starting.

print(classification_report(pred,y_test))

precision recall f1-score support 0 0.99 0.94 0.96 2615 1 0.93 0.93 0.93 2167 2 0.46 0.75 0.57 141 3 0.59 0.78 0.67 73 4 0.08 0.40 0.13 5 5 0.00 0.00 0.00 0 6 0.00 0.00 0.00 0 7 1.00 1.00 1.00 1 8 0.00 0.00 0.00 0 9 0.00 0.00 0.00 0 avg / total 0.94 0.93 0.93 5002


print(confusion_matrix(pred,y_test))

    [[2455 134 0 0 15 10 0 0 0 1]      [ 27 2006 111 12 8 0 0 0 1 2]      [ 0 5 106 28 0 0 2 0 0 0]      [ 0 0 13 57 0 0 3 0 0 0]      [ 1 2 0 0 2 0 0 0 0 0]      [ 0 0 0 0 0 0 0 0 0 0]      [ 0 0 0 0 0 0 0 0 0 0]      [ 0 0 0 0 0 0 0 1 0 0]      [ 0 0 0 0 0 0 0 0 0 0]      [ 0 0 0 0 0 0 0 0 0 0]]

Making Submission File

making test data suitable for prediction

Test = test.drop('id',axis=1)
Test.head()
training model with whole training data provided in the Kaggle Problem.
classifier.fit(x,y)
pred = classifier.predict(Test)
Making the submission file in the given format
pred = pd.DataFrame(pred)
sampleSub['hand'] = pred
sampleSub.to_csv('sumbit.csv',index=False)


Some of our other tutorials for Python for Data and Machine Learning