Neural networks have been around for a number of decades now and have seen their ups and downs. Recently they’ve proved to be extremely powerful for image recognition problems. Or, rather, a particular type of neural network called a convolutional neural network has proved very effective. In this post, I want to build off of the series of posts I wrote about neural networks a few months ago, plus some ideas from my post on digital images, to explain the difference between a convolutional neural network and a classical (is that the right term?) neural network.
First, let me quickly review the idea behind a neural network: We start with a collection of neurons, each of which takes a collection of input values and uses them to calculate a single output value. Then we hook them all together, so that the inputs to each neuron are attached to either the outputs of other neurons or to coordinates/features of a data point that is fed into the network.
When you input a data point into neural network, the outputs of the first level of neurons are calculated, then they feed into the later neurons and so on until all the neurons have set their outputs based (directly or indirectly) on the input data. Abstractly, we can think of each neuron in a neural network representing an “idea”. The output of the neuron should be a value close to 1 if that “idea” is present in a given input data point, and close to 0 otherwise. The earlier neurons will represent relatively simple, low-level ideas, while the later neurons represent higher level, more abstract ideas that are combinations of the ideas defined by the earlier neurons. This perspective is kind of hard to grasp in general, but it starts to make sense in more specific contexts, such as analyzing pictures.
If we’re making a neural network to analyze images, then the input to the neural network will be a vector like we saw in the post on digital images: each dimension will represent how light one of the pixels is (or one of its RGB values if it’s a color image, but for simplicity lets stick to grey-scale). We saw that when we encoded images as vectors this way, vectors that were nearby in the data space corresponded to images that matched up very closely. We can use this fact to understand how the neurons in a neural network respond to an image.
The standard way for a neuron to compute its output is to take a weighted sum of its input values, then apply a function with a steep drop-off that that sends all values below some threshold to values near 0, and all valued above that threshold to values near 1. By a weighted sum, I mean that each input is multiplied by a preset value (or, rather, a value that is set during the training phase), then the results are all added together. These preset values define a vector with the same number of features/dimensions as vector defined by the input values, and this weighted sum is essentially a dot product of the two vectors.
If you don’t remember what a dot product is, don’t worry – all you need to know is that (under the appropriate assumptions that I’ll gloss over) the resulting value is higher when the two vectors are closer together, and goes to zero as the vectors move farther apart. (Gory details: The dot product of two unit vectors is the cosine of the angle between them, and cosine is close to one for small angles, then goes to zero as the angle increases, up to a right angle.)
So, for the neurons that get their input directly from each incoming data point, we can interpret this as follows: The fixed vector of weights defines an image. The neuron calculates its output by comparing the input image to this fixed image, and if these images are reasonably close, its output is close to 1. Otherwise, it’s close to 0.
Article Source: https://shapeofdata.wordpress.com/2015/01/24/convolutional-neural-networks/