Computer Science‎ > ‎

Python for Data: (8) Ada-grad vs Bold-driver for linear classification

In this blog we will see that which step length control works better among ada-grad and Bold-driver. Here we have chosen data-set called 'Occupancy.csv' 

Data Attribute Information:

date time year-month-day hour:minute:second 
Temperature, in Celsius 
Relative Humidity, % 
Light, in Lux 
CO2, in ppm 
Humidity Ratio, Derived quantity from temperature and relative humidity, in kgwater-vapor/kg-air 
Occupancy, 0 or 1, 0 for not occupied, 1 for occupied status

So, here our task is to perform linear regression with sgd by choosing two different step length control (ada-grad & bold-driver) and see the difference. Meaning, which step length control is reaching to minimum loss in less iteration/epochs. 

First we will define some functions to make this blog and task organized. In the last blog we have already seen bold-driver step-length but for you all convenience let's define it again so that you can access it here without hopping to previous blog. 

Functions for Bold-driver

  • The Bold Driver Heuristic makes the assumption that smaller step sizes are needed when closer to the optimum
  • It adjusts the step size based on the value of f (x) at time t 
  • If the value of f (x) grows, the step size must decrease
  • If the value of f (x) decreases, the step size can be larger for faster convergence

  • Mu is the stepsize of last update
  • f-newf-old =  function values before and after the last update
  • mu+ and mu- is  stepsize increase and decay factors

This the the steps involved in computing Bold-driver step length control, that we are gonna write in our python function below - 
here we are done with defining functions for Bold-driver. In case any of you have 

Ada-grad function definition 

Similarly we will write some more python  function for Ada-grad and implement it as a step size control in our Logistic regression.

Here is the pseudo code for ada-grad. Let's write it in python 
  • Takes the inputs in matrix form. Also 1's should be added as first column in x_train and x_test to account for Beta_0 
  • x_train, y_train, x_test, y_test are matrices
Here both bold-driver and ada-grad is ready with logistic regression. Now let's use it on our 'occupancy' data set. 

Dropping ''na'' for the dataset and column [date]. Because [date] has nothing to do with our classification. 
Above results are by using ada-grad as step length. 
Above results are by using bold-driver as step length in logistic regression algorithm. 
Let's plot the differences, i.e. which step length control is converging fast and better than other. 
 Here one can see that on test set Ada-grad win the race and converged in less iteration with minimum loss. whereas, Bold-driver has much loss and even after 100 epochs it has big loss as compared to Ada-grad. 

 What's Next

  1. Regularization 
  2. Hyper parameter tuning 
  3. Newton method for logistic regression