Computer Science‎ > ‎

Python for Data: (15) Hyper-Parameter Tuning and Cross-Validation

What is Hyper-parameter ?

If you're new to machine learning, you may have never encountered the term hyper-parameters before. This blog will help you to understand the concept of Hyper-parameter and it's python implementation.

This website has very good tutorial on this. Have a look for more deep insight. Here I'm gonna write in brief but very soon you can get a complete theory blog on hyper-parameter. Please do visit again. 

Your input data (also called training data) is a collection of individual records (instances) containing the features important to your machine learning problem. This data is used during training to configure your model to accurately make predictions about new instances of similar data. However, the actual values in your input data never directly become part of your model.

Your model's parameters are the variables that your chosen machine learning technique uses to adjust to your data. For example, a deep neural network (DNN) is composed of processing nodes (neurons), each with an operation performed on data as it travels through the network. When your DNN is trained, each node has a weight value that tells your model how much impact it has on the final prediction. Those weights are an example of your model's parameters. In many ways, your model's parameters are the model—they are what distinguishes your particular model from other models of the same type working on similar data.

Moreover, you can think about gradient descent, how fast your model reach to it's minimum given best parameter. It comprises of step length (alpha), regularization term (lambda), optimization parameter (beta) and so on. Let's implement it in python. 

Think that you have a dataset called "redwine.csv" and you have already pre-processed it and split into train and text part. Now we will find best hyper parameter for  mini batch gradient descent. Let's make some python functions to do so. 

This above function will split our given data_x, data_y and number of folds into training set and validation. 

The function above is to compute mini batch grid search for gradient descent to find optimal parameters. Such as, optimal_alpha, optimal_lambda etc....

Now our next task would be to define the range of values for each hyper-parameter and then finding best one. 

Here, you can choose range of values that you may like to be optimized for each and every parameter. As you can see that we have chosen K-fold cross validation as 5(folds), batch size = 50, iterations = 100(epochs), and regularization parameter Lambda has wide range to choose best lambda for the gradient descent. 

Remember, we are using a dataset that was already pre-processed and being used with the python function (GD_minibatch_GridSearch) in the last line of above code. 

Now let's see which parameters are well suited for our data-set to make better prediction. 
These are the best parameter for our task to predict whether the quality of wine is good or bad based on it's chemical factors. Meaning, if you will use these above parameters you will get your best output in short time. Let's plot these - 

Here, one can see in the 3-D visualization that how much loss your function would generate on specific values of lambda and alpha parameters. Therefore, we always optimize our parameter to get least error/loss. 


What's Next 
  1. Implement Newton’s Method