dataset.head()
X
array([[ 19, 19000],
[ 35, 20000],
[ 26, 43000],
[ 27, 57000],
[ 19, 76000],
[ 27, 58000],
[ 27, 84000],...])
y
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, ....])
Splitting the dataset into the Training set and Test set
Always recommended to split the data and test the accuracy of the data.You can play around with test_size, and random_state, they define the size and random selection of data point from the data respectively.
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
Feature Scaling
in Data, all column may not have the similar size of scale, like number of rooms to the size of room it is always good practice to have a feature scaling on the data before feeding to the training model.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Fitting Logistic Regression to the Training set
We can import the library as we need, its not like c++ :), while definning object classifier for logistic regression, we can play with the value of random_state.
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
Predicting the Test set results
y_pred = classifier.predict(X_test)
classifier.score(X_test,y_test)
0.89000000000000001
from sklearn.metrics import confusion_matrix,classification_report
print(confusion_matrix(y_test, y_pred))
[[65 3]
[ 8 24]]
print(classification_report(y_test,y_pred))
precision recall f1-score support
0 0.89 0.96 0.92 68
1 0.89 0.75 0.81 32
avg / total 0.89 0.89 0.89 100
- Visualization of Logistic Regression Working