Alexnet
Alexnet is a monumental neural network that was first proven that CNN (Convolutional Neural Network) performs better than any other types of neural network in terms of image classification. You may think of this as the ancestor of most of CNN based neural network that you see now a days. So it would be important and worth having detailed understanding on this neural network.
Highlights of Architecture
- Number of Parameters : 60 Million
- Number of Neurons : 650,000
- Number of Convolutional Layer : 5 Convolutional Layers + Max Pooling layers
- Fiirst Layer : 224 x 224 x 3 image input, 96 kernels of size 11 x 11 x 3 with stride of 4
- Second Layer : 256 kernels of size 5 x 5 x 48
- Third Layer : 384 kernels of size 3 x 3 x 256
- Forth Layer : 384 kernels of size 3 x 3 x 192
- Fifth Layer : 256 kernels of size 3 x 3 x 192
- Number of Fully Connected Network : 3.
- 4096 neurons for each.
- Final layer with 1000 way softmax
- Use of Non-saturating Neuron (ReLU)
- Use of Dropout
- Input Size : 256x256.
Why ReLU ?
Why ReLU(a Non-Saturating Function) than tanh(a saturating function) ? it is because it is observed that ReLU learns several times faster than hanh (shown in Figure 1 of Ref [1]).
ReLU does not require input normalization to prevent them from saturating.
Why use of Dropout ?
It is to reduce overfitting in the fully-connected layers.
How to process the training image to fit into input dimension ?
The training image is not all same as as this. So the authors rescale the image in such a way that the shorter side is of length 256 and then cropped out the central 256x256 patch from the rescaled image. They trained the network on the raw RGB values of the pixel.
Why CNN rather than standard feedforward Network ?
Theoretically the standard feedforward network can solve any types of classification problem if the enough number of neurons are provided, but in practice we don't know exactly what is the enough number for our application.. and we don't know 'enough number' can be trained by the reasonable/practical computing power.
The paper (Ref [1]) says as follows :
Compared to standard feedforward neural networks with similarly-sized layers, CNNs have much fewer connections and parameters and so they are easier to train, while their theoretically-best performance is likely to be only slightly worse.
Fighting against Overfitting
As you may know, one of the ever occuring problem in most of neural network based Machine Learning is to overfitting issue. According to the paper (Ref [1]), there are a few common technique are used to tackle the issue of overfitting as summarized below.
- Generating image translation and horizontal reflection. They do this by extracting random 224 x 224 patches from the 256 x 256 original images and use those exctracted image as training data
- Altering the intensities of the RGB Channels in training image. For this, the authors performed PCA on the set of RGB pixel values throughout the training set and added multiples of the found principal components with magnitudes proportional to [the corresponding eigen values x a random Gaussian variable N(0,0.1)]
Reference
[1] ImageNet Classification with Deep Convolutional Neural Networks
by Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton