Computer Science‎ > ‎

Python for Data: (3) Linear regression via Normal Equation

Dear Folks..!! 

In this blog we are gonna learn how to perform linear regression via normal equation. Where we will be working with many matrix decomposition algorithms. Although, every decomposition will give us same results but they are different way to do the same thing and vary up on the complexity. 

Once again we are working with "house.csv" data set. Following are some objectives that we are gonna fulfill in this blog. 

1. Download the data-set  from here. Load it as data, [Hint:] from loaded data you need to separate ydata i.e. sales prices of houses, which is your target.

2. Choose those columns, which can help you in prediction i.e. contain some useful information. You
can drop irrelevant columns.

3. Split your dataset Xdata, ydata into Xtrain, ydata and Xtest, ytest i.e. you can randomly assign 80%
of the data to a Xtrain, ytrain set and remaining 20% to a Xtest, ytest set.

4. Implement learn-linreg-NormEq algorithm and learn a parameter vector . using Xtrain set. You
have to learn a model to predict sales price of houses i.e. , ytest.

5. Line 4, in learn-linreg-NormEq uses SOLVE-SLE. You have to replace SOLVE-SLE following options.
For each option you will learn a separate set of parameters.
(a) Gaussian elimination
(b) Cholesky decomposition
(c) QR decomposition

To perform above mention objectives we can use either Scikit-learn library or we can write code line by line using just numpy etc.. libraries. For beginners it is recommended to start from scratch and make your hand dirty with writing every line of code rather than using some powerful libraries. Therefore, Let's make our hands dirty. 

These are the dependencies that we will be using through-out this blog. As most of you might be knowing about numpy and pandas, we will not discuss about them here. For beginner he should read the documentation for better insight. 

Let's load the data-set. 

 Here we can see the top 10 rows of our loaded data. One thing we need to notice here is that some rows has string values (brick, nbhd) that can't be processed while working on calculation. we need to change them into pandas 'dummies'. means making them 0 or 1. these numbers will represent the yes/no for column 'brick'

Now we can see that all our string values had been converted successfully into dummies. 
Out target is to predict house price and column 'price' would be out Y i.e. target. 
Next step is to drop some irrelevant columns that has less or no impact for out prediction of house price. 
from column we are dropping two columns. you can use box plot to see how much effect particular column has to predict the prices. For more detail see the blog "getting started with pandas". 
Now we are done with our data pre-processing. Therefore, we can move ahead and divide purified data into train and test set. Here is a nice blog that will give you really good insight about training and testing data set & why machine learners does it ?? 
Here we divided our xdata into X_train ans X_test similarly from ydata that was our price column we divided y_train $ y_test. We can see that how much linear relationship out data and target has. Here 0 means there is no linear relationship at all and 1 means perfect for linear regression. 

Although, this is not part of out exercise or objective but, just to give you insightful information we did it. Remember, above 2 lines of code can't be run directly. they are borrowed from sk-learn linear regression score and you need separate code and libraries to do so. 

(a) Gaussian Matrix Decomposition
In linear algebra, Gaussian elimination (also known as row reduction) is an algorithm for solving systems of linear equations. It is usually understood as a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used to find the rank of a matrix, to calculate the determinant of a matrix.
To perform row reduction on a matrix, one uses a sequence of elementary row operations to modify the matrix until the lower left-hand corner of the matrix is filled with zeros, as much as possible. There are three types of elementary row operations: 1) Swapping two rows, 2) Multiplying a row by a non-zero number, 3) Adding a multiple of one row to another row. Using these operations, a matrix can always be transformed into an upper triangular matrix, and in fact one that is in row echelon form. Once all of the leading coefficients (the left-most non-zero entry in each row) are 1, and every column containing a leading coefficient has zeros elsewhere, the matrix is said to be in reduced row echelon form. This final form is unique; in other words, it is independent of the sequence of row operations used. For example, in the following sequence of row operations (where multiple elementary operations might be done at each step), the third and fourth matrices are the ones in row echelon form, and the final matrix is the unique reduced row echelon form - wikipedia

Above one can see the Gaussian elimination function to decompose any matrix. You can read the comments to understand every part of the function. This function takes a matrix as input called 'm' here and decompose it. Here, 'm'  is nothing else but xdata. 
Here we have a formula to compute linear regression. we have written same formula in above code. 
This is the pseudo code to perform linear regression via normal equation. 

Here we are gonna see that how much error we made in predicting house price by subtracting actual y_test from predicted y_hat 

These are the errors with respect to our actual y (price). Average Residual & Root-mean-square error (RMSE) for gaussian elimination decomposition is as folows 
now we will do the same thing with Chelsky Decomposition Method. 

(b) Chelsky Decomposition Method

In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful e.g. for efficient numerical solutions and Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations. 

A particularly easy factorization h = kk′ to perform is one known as the Cholesky factorization.
Any positive semidefinite matrix has a factorization of the form h = gg′ where g is a lower triangular matrix. Solving for g is straightforward.

Above python function has been used to decompose matrix via 
Cholesky . Just read the code line by line with comments you will understand it very well. it's pretty much straight forward. Here is a good wikipedia articles to get better insight.

  Here we are calculating our parameters called 'betas'. 
Above part of code is to get error and visualize it similar to Gaussian method
You can see that there is no difference in getting error from both decomposition. 

(c) QR matrix decomposition for linear regression

Here we can use our backward substitution function that we have created for Cholesky decomposition. 
The QR decomposition, also known as the QR factorization, is another method of solving linear systems of equations using matrices, very much like the LU decomposition. The equation to solve is in the form of , Ax = B, where matrix A = Qr. Except in this case, A is a product of an orthogonal matrix Q and upper triangular matrix R. The QR algorithm is commonly used to solve the linear least squares problem.
Here we are done with linear regression via normal equation. In this blog we learnt 3 major matrix decomposition methods and how they works. 

What's Next 

Optimization techniques 
Gradient Descent for regression