Skip to main content

Coursera Deep Learning Course 2

Training/Dev/Test Set
what is training set / dev set / test set
In traditional methodology/ when we have small size data we can take 60-20-20 ratio to get training set-validation set/dev set -test set.
Now, when we have big data it is fine that the dev set or test set to be less than 10 or 20 percent of your data. Or even 98-1-1 ratio is also fine.
One rule of thumb is : Test set and Dev set should come from same distribution.

Bias and Variance
Bias means the high error rate in training. I may be due to underfitting. For this we can change neural network architecture like network size and number of iterations. Varaince means error rate in Dev set . This may be due to Over fitting of the data . This can be avoided by increasing number of data and regularization.
Bias - Variance trade off means balancing both without the increase in other. Regularization is used to reduce the variance . It may hurt bias and bias may increase a little but not much if we have  a bigger network.

L2 Regularization - For variance problem
It is used for avoiding overfitting in the network. (Add pics of equation).
In Neural network L2 regularization is also called Forbenius form.
It is also known as 'weight decay' . Explain the reason.
Add equations. () . In this we are adding an extra term called 'regularization parameter'- called 'lamda'. So when tuning hyper parameters we should consider this one also.

Why L2 Regularization, how it helps :
* It penalizes the neural network for having larger weights.
* The main idea is having a bigger network will cause overfitting in the data
* Hence L2 regularization maps the weights to zero or in more clear terms it reduces the effect of weights, thus making the network small
* L2 regularization is a very powerful technique and it is mostly used in most of the deep learning works
* When we plot the cost of gradient descent against the number of iterations, if we are using regularization then we can see the drop in cost function monotonically.
* Explain with image how it is reduced (Add):

Drop out regularization
This is another very powerful regulaization method. We can do drop out regularization in different ways. One of them is inverted drop out. In this
some of the the hidden units and its connections are removed from the etwork using one probability .
(Images of how it works)
Another thing in prcatical is for different training sets make the different nodes zero. That is called drop out.

Drop out
In drop out , no hyper parameters are added into the Cost function. We are just eliminating the random nodes. Main use of cost function is in Computer vision. Because in computer vision, there is not much data available. So scientists guess there will be overfitting and so they are adding the drop ot layer strictly. 

Data augmentation
If the neural network is overfitting then one way to avoid this is to add more data. But for example. in computer vision the amount of data available will be less and hence we can perform different operations on images like flipping horizontally etc. to increase the training data set. This is called data augmentation.

Early stopping
Early stopping refers to stop the training of neural network early so that weights of the network will be small. Since we are initiating the weights small, after a small number of steps the weights will be equal to zero only, so if we are stopping the training there , it will be similar to l2 regularization and it helps to reduce overfitting. But this is not a good way since it breaks the orthogonality rule of the DNN , that is separate actions for separate functions. In the course Andrew NG prefers L2 regularization more, although finding the lamda is a costly procedure.

Comments

Popular posts from this blog

Coursera Course 3 Structuring Machine Learning Projects

Week One - Video One - Why ML STrategy Why we should learn care about ML Strategy Here when we try to improve the performance of the system we should consider about a lot of things . They are: -Amount of data - Amount of diverse data - Train algorithm longer with gradient descent -use another optimization algorithm like Adam -  use bigger network or smaller network depending out requirement -  use drop out - add l2 regularization - network architecture parameters like number of hidden units, Activation function etc. Second Video - Orthogonalization Orthogonalization means in a deep learning network we can change/tune so many things for eg. hyper parameters to get a more performance in the network . So most effective people know what to tune in order to achieve a particular effect. For every set of problem there is a separate solution. Don't mix up the problems and solutions. For that, first we should find out where is the problem , whether it is with training ...

Converting DICOM images into JPG Format in Centos

Converting DICOM images into JPG Format in Centos I wanted to work with medical image classification using Deep learning. The Image data set was .dcm format. So to convert the images to jpg format following steps have performed. Used ImageMagick software. http://www.ofzenandcomputing.com/batch-convert-image-formats-imagemagick/ Installed ImageMagick in Centos by downloading the rom and installing its libraries : rpm -Uvh ImageMagick-libs-7.0.7-10.x86_64.rpm rpm -Uvh ImageMagick-7.0.7-10.x86_64.rpm After installation the the image which is to be converted is pointed in directory. Inside the directory executed the command: mogrify -format jpg *.dcm Now dcm image is converted to JPG format.