Computer Science‎ > ‎

Python for Data: (4) Predicting Flight Price with Tensorflow

Welcome back everyone to the world of Machine Learning 
Here, in this blog we are gonna learn or predict the flight price with respect to the distance it travels. In this article we will be working on "airq402.dat.text" data-set. we have already explored this data-set in our Gradient Descent tutorial previously. that can be find here. This time we will use Tensorflow a very powerful library that can process tensor data meaning the data that is not just limited to vectors or matrices but more than that. Where data-set has n-dimension there data scientist prefer Tensorflow to work on.   

For using this library first you need to install it. Installation varies system to system but as I'm expecting more users are windows operating system holder. Just go to command prompt and use [pip install tensorflow]

If you are working with Anaconda jupyter notebook as like me. you can also use [conda install temsorflow] in the command prompt. Since I already have it therefore, it says 'requirement already satisfied'. 

Here is our dependencies along with tensorflow. One new thing here is line 10 in above code. 'urllib.request' this is to load the data set from URL directly. 
Here we are taking 2 figures. one is to plot our data along with flight distance on the x axis and at the y axis it's price that you need to pay w.r.t distance flight covers. Second picture is about the cost function that is our error in predicting actual flight price. it gets reduced by gradient descent iterations as you all know. 
This part of code is to load the data set from url directly. 
We have row data at the moment therefore, we need to refine it by using 'strip' object of python. Better insight is available here
Let's look at the input data. 
Here is our data. there is no cost function because we haven't perform any regression therefore, we have no cost or loss at the moment. Now our task is to draw a linear line that can make some relation between flight distance and flight price. 

Here we are normalizing our data before processing. 
Motivation behind normalization

Since the range of values of raw data varies widely, in some machine learning algorithms, objective functions will not work properly without normalization. For example, the majority of classifiers calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.

Another reason why feature scaling is applied is that gradient descent converges much faster with feature scaling than without it

Second part of code is about defining thetas (parameter) and 'h' that is hypothesis. 

we are working with most beloved optimization algorithm of machine learning that is 'Gradient Descent'. It is the one which tell you how much is your error or loss at certain iterations. 

Here you can see the linear line which is telling us that how much would be cost according to flight distance. and right figure is to tell us that how much is the cost at certain iteration. One can see that here, after 20 iteration we reach to almost optimum accuracy i.e. nearly zero loss. 

What's Next 
Linear Classification