Course materials and notes for Stanford class CS231n: Convolutional Neural Networks for Visual Recognition. Convolutional Neural Networks are very similar to ordinary Neural Networks from the convolutional neural network tutorial pdf chapter: they are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity.

The whole network still expresses a single differentiable score function: from the raw image pixels on one end to class scores at the other. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network. Each hidden layer is made up of a set of neurons, where each neuron is fully connected to all neurons in the previous layer, and where neurons in a single layer function completely independently and do not share any connections. This amount still seems manageable, but clearly this fully-connected structure does not scale to larger images.

For example, an image of more respectable size, e. Moreover, we would almost certainly want to have several such neurons, so the parameters would add up quickly! Clearly, this full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting. Convolutional Neural Networks take advantage of the fact that the input consists of images and they constrain the architecture in a more sensible way. Neural Network, which can refer to the total number of layers in a network.

As we will soon see, the neurons in a layer will only be connected to a small region of the layer before it, instead of all of the neurons in a fully-connected manner. Left: A regular 3-layer Neural Network. Every Layer has a simple API: It transforms an input 3D volume to an output 3D volume with some differentiable function that may or may not have parameters. CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. 10 numbers correspond to a class score, such as among the 10 categories of CIFAR-10. As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.

Note that some layers contain parameters and other donâ€™t. POOL layers will implement a fixed function. Each volume of activations along the processing path is shown as a column. Since it’s difficult to visualize 3D volumes, we lay out each volume’s slices in rows. The last layer volume holds the scores for each class, but here we only visualize the sorted top 5 scores, and print the labels of each one. The architecture shown here is a tiny VGG Net, which we will discuss later.