skip to Main Content

I’m using Matlab ( github code repository ). The details of the network are:

  • Hidden units: 100 ( variable )

  • Epochs : 500

  • Batch size: 100

The weights are being updated using Back propagation algorithm.

I’ve been able to recognize 0,1,2,3,4,5,6,8 which I have drawn in photoshop.

However 7,9 are not recognized, but upon running on the test set I get only 749/10000 wrong and it correctly classifies 9251/10000.

Any idea what might be wrong? Because it is learning and based on the test set results its learning correctly.

2

Answers


  1. I don’t see anything downright incorrect in your code, but there is a lot that can be improved:

    1. You use this to set the initial weights:

      hiddenWeights = rand(hiddenUnits,inputVectorSize);
      outputWeights = rand(outputVectorSize,hiddenUnits);
      
      hiddenWeights = hiddenWeights./size(hiddenWeights, 2);
      outputWeights = outputWeights./size(outputWeights, 2);
      

      This will make your weights very small I think. Not only that, but you will have no negative values, so you’ll throw away half of the sigmoid’s range of values. I suggest you try:

      weights = 2*rand(x, y) - 1
      

      Which will generate random numbers in [-1, 1]. You can then try dividing this interval to get smaller weights (try dividing by the sqrt of the size).

    2. You use this as the output delta:

      outputDelta = dactivation(outputActualInput).*(outputVector - targetVector) % (tk-yk)*f'(yin)
      

      Multiplying by the derivative is done if you use the square loss function. For log loss (which is usually the one used in classification), you should have just outputVector - targetVector. It might not make that big of a difference, but you might want to try.

    3. You say in the comments that the network doesn’t detect your own sevens and nines. This can suggest overfitting on the MNIST data. To address this, you’ll need to add some form of regularization to your network: either weight decay or dropout.

    4. You should try different learning rates as well, if you haven’t already.

    5. You don’t seem to have any bias neurons. Each layer, except the output layer, should have a neuron that only returns the value 1 to the next layer. You can implement this by adding another feature to your input data that is always 1.

    MNIST is a big data set for which better algorithms are still being researched. Your networks is very basic, small, with no regularization, no bias neurons and no improvements to classic gradient descent. It’s not surprising that it’s not working too well: you’ll likely need a more complex network for better results.

    Login or Signup to reply.
  2. Nothing to do with neural nets or your code,
    but this picture of KNN-nearest digits shows that some MNIST digits
    are simply hard to recognize:

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search