18 May 2009

This tutorial introduces the reader to the concept of neural networks by presenting the first ever invented neural network structure, the perceptron neural network. It was proposed back in 1945 and compared with the most recent ones has a lot of drawbacks, but it is the perfect starting point for someone wanting to learn about the field. If you want you can just get the code for this small tutorial which is found in here but it would be wise to read on to understand how the perceptron works and grasp the theory behind it.

Tutorial Prerequisities

  • The reader should have a basic understanding of C/C++
  • The reader should know how to compile and run a program using any of the popular C compilers

Tutorial Goals

  • The reader will understand the concept of neural networks
  • The reader will understand how the perceptron works
  • The reader will apply the perceptron in a small toy problem application of differentiating between RGB colors.


Tutorial Body

This tutorial is written in the hope that it can be of use to people trying to learn more about artificial neural networks. A little bit of history, a little bit of what relation they have to biological neural networks and a lot of C++ source code examples of neural networks is what you can expect. As always I can not promise that the code contained in the tutorial is the best implementation of a perceptron but it is enough to make our point and to show to someone interested in neural networks a simple perceptron implementation.

First of all let's see what exactly is a neural network! Neural networks are abstract mathematical models based on the way the brain works. Take for example our brain. It has 1011 neurons inside it all together forming a powerful massive parallel computing machine. These neurons all communicate with each other via links called synapses as can be seen in the picture on the right. Each neuron has many dendrites, these are the receptors, the parts of the neuron that accept other synapses and receives incoming signals from other neurons. Moreover each neuron has one Axon which is the part of the neuron sending out electrical signals to other neurons. As can be seen in the picture this is done in an electrochemical way and further explanation is beyond the scope of this tutorial. For anyone really interested in the inner workings of the brain I would recommend the book Neuroscience, Exploring the brain by Mark F. Bear, Barry W.Connors and Michael A. Paradiso. It is a very well written book and explains everything in a way that even non medical students, like myself, can understand them.

As I already said the correlation between artificial neural networks and the brain stops at neurons and their connections. From there and on they are two quite different machines. Artificial neural networks (here and on abbreviated as ANN) were first introduced by McCulloch and Pitts with the introduction of the first ANN, the perceptron. The perceptron has quite a simple structure. In its basic form it is comprised of one neuron as can be seen in the picture. It is given many inputs and they are all connected with the neuron with synapses which have a corresponding weight on them (w1 to w4). These weights define the strength of each connection, that means how much will the particular connection contribute to the final result that the neuron will produce. So what a neuron does is produce a weighted sum of its inputs. Once that is done the perceptron neuron passes this result through an activation function in order to get a more normalized and smooth result.

Frequently used activation functions are:

  • The Threshold function, f(x) = 1, if x>=0 and 0 if it is not
  • The Simgoid function, f(x) = 1/(1+e-x)
  • The hyperbolic tangent function, f(x) = (e2x-1)/(e2x+1)
  • The perceptron as we already said computes a weighted sum of its inputs, but how does it learn? How does it know what each input pattern corresponds to? The answer is that it does not! You, or someone else who will act as a teacher, a supervisor, hence the name supervised learning will teach it. The way this is done is that each input pattern (each collection of Xi in the above diagram) is associated with a target. The function that connects the input and the target output is what the perceptron must find. The way it accomplishes this is by this very simple rule: W(n) = W(n+1) + η(d(n)-y(n))*x(n)
    , where W(n) is the old weights vector, W(n+1) is the new weights vector η is a user-defined constant called the teaching step, d(n) is the target vector, y(n) is the actual output of the network and x(n) is well ... you guessed it the corresponding input!

    That was the theory behind the perceptron. But who likes theories? What I want to see is some code, right? Well here we go then. We will try to solve a simple problem.

    1. int ourInput[] = {
    2. //RED GREEN BLUE CLASS
    3. 0, 0, 255, CLASS_BLUE,
    4. 0, 0, 192, CLASS_BLUE,
    5. 243, 80, 59, CLASS_RED,
    6. 255, 0, 77, CLASS_RED,
    7. 77, 93, 190, CLASS_BLUE,
    8. 255, 98, 89, CLASS_RED,
    9. 208, 0, 49, CLASS_RED,
    10. 67, 15, 210, CLASS_BLUE,
    11. 82, 117, 174, CLASS_BLUE,
    12. 168, 42, 89, CLASS_RED,
    13. 248, 80, 68, CLASS_RED,
    14. 128, 80, 255, CLASS_BLUE,
    15. 228, 105, 116, CLASS_RED
    16. };

    This is an array with our example's inputs. They are RGB color values and a corresponding class. The classes are just two, CLASS_RED if the color is predominantly RED and CLASS_BLUE if the color is predominantly BLUE. Pretty simple huh? Now let's head on to create a perceptron which will be able to differentiate between these two classes. Below you can see our perceptron class.

    1. enum activationFuncs {THRESHOLD = 1, SIGMOID, HYPERBOLIC_TANGENT};
    2. class Perceptron
    3. {
    4. private:
    5. std::vector<float> inputVector; //a vector holding the perceptron's inputs
    6. std::vector<float> weightsVector;//a vector holding the corresponding inputs weights.
    7. int activationFunction;
    8. public:
    9. Perceptron(int inputNumber,int function);//the constructor
    10. void inputAt(int inputPos,float inputValue);//the input population function
    11. float calculateNet();//the activation function type
    12. void adjustWeights(float teachingStep, float output, float target);
    13. float recall(float red,float green,float blue);//a recall for our example program
    14. };

    It has inputs, the weights we mentioned and an activation function. The network is initialized with random weights between -0.5 and 0.5 . Since our inputs have RGB values, which range from 0 to 255 it is a good idea to normalize them, which means to give them a corresponding value between 0 and 1.0 . Let's take a look at how to do these in code. This is a snippet from the main function of the program:

    1. //let's create a perceptron with 3 inputs,
    2. //using the sigmoid as the activation function
    3. Perceptron ann(3,SIGMOID);
    4. float mse = 999;
    5. int epochs = 0;
    6. //The training of the neural network
    7. while(fabs(mse-LEASTMEANSQUAREERROR)>0.0001)
    8. {
    9. mse = 0;
    10. float error = 0;
    11. inputCounter = 0;
    12. //Run through all 13 input patterns, what we call an EPOCH
    13. for(int j= 0; j < inputPatterns; j++)
    14. {
    15. for(int k=0; k< 3; k++)//give the 3 RGB values to the network
    16. {
    17. ann.inputAt(k,normalize(ourInput[inputCounter]));
    18. inputCounter++;
    19. }
    20. //let's get the output of this particular RGB pattern
    21. output = ann.calculateNet();
    22. error += fabs(ourInput[inputCounter]-output); //let's add the error for this iteration to the total error
    23. //and let's adjust the weughts according to that error
    24. ann.adjustWeights(TEACHINGSTEP,output,ourInput[inputCounter]);
    25. inputCounter++;//next pattern
    26. }
    27.  
    28. mse = error/inputPatterns; //Compute the mean square error for this epoch
    29. printf("The mean square error of %d epoch is %.4f \r\n",epochs,mse);
    30. epochs++;
    31. }

    What can we see here? This is the training of the perceptron. While the mean square error (mse) is greater than the defined least mean square error we are iterating through all the input patterns. For each input pattern we calculate the output of the neural network with the current weight assigned to it. Then we compute the absolute difference of that output and the actual desired output. Subsequently we adjust the weights according to the rule we shown above and proceed to the next input pattern. As we already said this goes on until the mean square error reaches the desired magntitude.

    When that happens our network is considered sufficiently trained. Since our toy problem has little input and it is an easy problem to solve the chosen least mean square error is 0.0001. The smaller mean square error your network gets to, the better it knows how to solve your problem for the data you trained it with. Be aware though that this does not mean that it's better at solving that particular problem. By giving a very small mean square error you run the risk of over-training your network and as a result leading it to recognize only the patterns you give as input and making mistakes at all other patterns. If that happens then the network can not generalize over the wide array of all your input patterns. Which means your network has not learned the problem correctly.

    Enough with that, now let's head on to recalling the network with various values input by the user.

    1. int R,G, B;
    2. char reply = ' ';
    3. while(reply != 'N')
    4. {
    5. printf("Give a RED value (0-255)\n\r");
    6. cin>>R;
    7. printf("Give a GREEN value (0-255)\n\r");
    8. cin>>G;
    9. printf("Give a BLUE value (0-255)\n\r");
    10. cin>>B;
    11. result = ann.recall(normalize(R),normalize(G),normalize(B));
    12. if(result > 0.5)
    13. printf("The value you entered belongs to the BLUE CLASS\n\r");
    14. else
    15. printf("The value you entered belongs to the RED CLASS\n\r");
    16.  
    17. printf("Do you want to continue with trying to recall values from the perceptron?");
    18. printf("\n\r Press any key for YES and 'N' for no, to exit the program\n\r");
    19. cin>>reply;
    20. }

    Well here you can easily see that the user can enter values continuously and get a reply from the neural network. It will correctly assign all values if sufficiently trained EXCEPT for those which are very close to the edge between blue and red even if it has been trained to do so. That is a very important defficiency that the perceptron has. It can only solve linearly separable problems, that is problems whose different solutions can be divided by a straight line as can be seen in the picture on the right.
    If a problem can be so nicely and linearly classified all is well and the perceptron can do the job for us. If not then bad things will happen

    This was shown by Marvin Minsky and as he wrote in his book Perceptrons(1969), a Perceptron can not even solve a problem as simple as the XOR problem, since it is not linearly separable. His book lead to the so called AI winter which lead AI research away from the research of neural networks, considered useless after the bashing of perceptrons. Fotunately that lasted only until 1986 when neural networks came back into mainstream AI with the introduction of Multi-Layer Perceptrons and the back-propagation learning rule which makes up for the defficiency of the simple perceptron. You can read about them in the multi-layer perceptron tutorial

    The source code of the perceptron tutorial can be downloaded from here. All it needs is compiling and you can watch the perceptron in action or play around with your own parameters by tweaking the various defines in main.c. As always the usual disclaimer of me stating that this might not be the best and optimal way to implement this applies. I would be delighted if people actually got intrigued about neural networks from this tutorial and were inspired to delve deeper into AI.

    Please do feel free to email me with any comments, advice or constructive criticism at: lefteris *at* realintelligence *dot* net and stay tuned for a multi-layer perceptron tutorial which will be coming soon