Training a network is the task of finding the correct weights and biases for a set of neurons. When using a pretrained network, we start with the weights provided. What about when training from scratch?

Two Strategies


  • 0s
    • Multiplying weights and inputs will lead to 0
  • 1s
    • Multiplying weights and inputs will lead to same input values
  • Any constant values with weights will lead to problems during back prop


  • Pick weights from a random distribution
  • Back prop makes decisions based on variety, so differences between random numbers will give it the information it needs to succeed.

There needs to be an element of randomness

Random Distribution Range

  • There is a relationship between initial weights and the number of inputs the first layer receives.
  • $y = 1 / \sqrt{n}$ where $n$ is the number of input
  • Random distribution should be between $-y$ nd $y$
  • This distribution leads to the model starting with a much lower loss.

Normal Distribution

  • Generally, taking random values from a normal distribution will yield better results than a random distribution.
  • Normal dist mean should be 0
  • And its range is $y$