skip to Main Content

I am currently trying to teach me something about neural networks. So I bought myself this book called Applied Artificial Intelligence written by Wolfgang Beer and I am now stuck at understanding a part of his code. Actually I understand the code I just do not understand one mathematical step behind it…
The part looks like this:

    for i in range(iterations):
        guessed = sig(inputs*weights)
        error = output - guessed
        adjustment = error*sig_d(outpus)
        #Why is there no learningrate?
        #Why is the adjustment relative to the error
        #muliplied by the derivative of your main function?
        weights += adjustment

I tried to look up how the gradient descent method works, but I never got the part with ajusting the weights. How does the math behind it work and why do you use the derivative for it?
Alo when I started to look in the internet for other solutions I always saw them using a learning rate. I understand the consept of it but why is this method not used in this book? It would realy help me if someone could awnser me these questions…

And thanks for all these rapid responses in the past.

2

Answers


  1. Why is there no learningrate?

    • there are lots of different flavors of neural networks, some will use learning rates and others probably just keep this constant

    Why is the adjustment relative to the error

    • what else should it be relative to? If there is a lot of error then chances are you need to do a lot of adjustments, if there was only a little error then you would only want to adjust your weights a small amount.

    muliplied by the derivative of your main function?

    • dont really have a good answer for this one.
    Login or Signup to reply.
  2. enter image description here

    To train a regression model we start with arbitrary weights and adjust weights so that the error will be minimum. If we plot the error as a function of weights we will get a plot like above figure where error J(θ0,θ1) is a function of weights θ0,θ1. We will be succeeded when our error will be very bottom of the graph when its value is the minimum. The red arrows show the minimum points in the graph. To reach to the minimum point we take derivative (the tangential line to a function) of our error function. The slope of the tangent is the derivative at that point and it will give us a direction to move towards. We make steps down the cost function in the direction with the steepest descent. The size of each step is determined by the parameter α, which is called the learning rate.

    The gradient descent algorithm is:

    repeat until convergence:
    
    θj:=θj −[ Derivative of J(θ0,θ1) in respect of θj]
    
    where
    j=0,1 represents the weights' index number.
    

    enter image description here

    In the above figure we plot error J(θ1) is a function of weight θ1. We start with an arbitrary value of θ1 and take derivative(slope of the tangent) of error J(θ1) to adjust weight θ1 so we can reach the bottom where error is minimum. If slope is positive we have to go left or decrease weight θ1. And if slope is negative we have to go right or increase θ1. We have to repeat this procedure until convergence or reaching minimum point.

    enter image description here

    If learning rate α is too small gradient descent converges too slow. And if α is too large gradient descent overshoots and fails to converge.

    All the figures have been taken from Andrew Ng’s machine learning course on coursera.org
    https://www.coursera.org/learn/machine-learning/home/welcome

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search