Computer Science‎ > ‎

Logistic Regression with Numpy and Pandas

Import Libraries


Import a few libraries you think you'll need (Or just import them as you go along!)

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Getting the data, if data in CSV file we can import as below, there are many other functions included in PANDAS  library to work with data importing and preprocessing.

dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
Let's see how our data looks in the data frame after importing, iloc(), return the data in NUMPY array. 

dataset.head()
X
array([[    19,  19000],
       [    35,  20000],
       [    26,  43000],
       [    27,  57000],
       [    19,  76000],
       [    27,  58000],
       [    27,  84000],...])
y
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, ....])

Splitting the dataset into the Training set and Test set

Always recommended to split the data and test the accuracy of the data.You can play around with test_size, and random_state, they define the size and random selection of data point from the data respectively.


from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

Feature Scaling

in Data, all column may not have the similar size of scale, like number of rooms to the size of room it is always good practice to have a feature scaling on the data before feeding to the training model. 

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Fitting Logistic Regression to the Training set

We can import the library as we need, its not like c++ :), while definning object classifier for logistic regression, we  can play with the value of random_state.
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

Predicting the Test set results


y_pred = classifier.predict(X_test)
classifier.score(X_test,y_test)
    0.89000000000000001
from sklearn.metrics import confusion_matrix,classification_report
print(confusion_matrix(y_test, y_pred))
    [[65 3]
     [ 8 24]]
print(classification_report(y_test,y_pred))
             precision    recall  f1-score   support

          0       0.89      0.96      0.92        68
          1       0.89      0.75      0.81        32

avg / total       0.89      0.89      0.89       100

  • Visualization of Logistic Regression Working