Gradient Descent

  • To recap, a neural network will make predictions for a dataset that we match against labels for that dataset indicating the actual result. For example, a neural network trying to predict which students will be admitted to a school will be trained using actual labels. By comparing predictions to the actual result, then updating the weights of the model to move the prediction closer to the result, we get a more accurate model.
  • We refine the prediction by reducing the error function. One strategy to reduce the error function is gradient descent. If you can only see 5 feet in front of you, the quickest way to descend a mountain is to choose to go in the direction that is the steepest. This is the same strategy of minimizing error, if you consider the bottom of the mountain the model with a minimized error function.
  • Caveat Following gradient descent can bring you to a local minimum without actually fully descending the mountain.


  • How do you find the error for multilayer neural network? Work backwards from the output, calculating new weights as you go.
    1. Use knowledge of the activation function to calculate the error term for the output.
    2. Use that result to calculate the error term for the previous layer. Repeat.