Computer Science‎ > ‎

In this blog we will see that which step length control works better among ada-grad and Bold-driver. Here we have chosen data-set called 'Occupancy.csv'

Data Attribute Information:

date time year-month-day hour:minute:second
Temperature, in Celsius
Relative Humidity, %
Light, in Lux
CO2, in ppm
Humidity Ratio, Derived quantity from temperature and relative humidity, in kgwater-vapor/kg-air
Occupancy, 0 or 1, 0 for not occupied, 1 for occupied status

So, here our task is to perform linear regression with sgd by choosing two different step length control (ada-grad & bold-driver) and see the difference. Meaning, which step length control is reaching to minimum loss in less iteration/epochs.

First we will define some functions to make this blog and task organized. In the last blog we have already seen bold-driver step-length but for you all convenience let's define it again so that you can access it here without hopping to previous blog.

## Functions for Bold-driver

• The Bold Driver Heuristic makes the assumption that smaller step sizes are needed when closer to the optimum
• It adjusts the step size based on the value of f (x) at time t
• If the value of f (x) grows, the step size must decrease
• If the value of f (x) decreases, the step size can be larger for faster convergence

• Mu is the stepsize of last update
• f-newf-old =  function values before and after the last update
• mu+ and mu- is  stepsize increase and decay factors

This the the steps involved in computing Bold-driver step length control, that we are gonna write in our python function below -
here we are done with defining functions for Bold-driver. In case any of you have

## Similarly we will write some more python  function for Ada-grad and implement it as a step size control in our Logistic regression.

• Takes the inputs in matrix form. Also 1's should be added as first column in x_train and x_test to account for Beta_0
• x_train, y_train, x_test, y_test are matrices

Let's plot the differences, i.e. which step length control is converging fast and better than other.
Here one can see that on test set Ada-grad win the race and converged in less iteration with minimum loss. whereas, Bold-driver has much loss and even after 100 epochs it has big loss as compared to Ada-grad.

## What's Next

1. Regularization
2. Hyper parameter tuning
3. Newton method for logistic regression