Training a network is the task of finding the correct weights and biases for a set of neurons. When using a pretrained network, we start with the weights provided. What about when training from scratch?

## Two Strategies

### Constants

• 0s
• Multiplying weights and inputs will lead to 0
• 1s
• Multiplying weights and inputs will lead to same input values
• Any constant values with weights will lead to problems during back prop

### Random

• Pick weights from a random distribution
• Back prop makes decisions based on variety, so differences between random numbers will give it the information it needs to succeed.

## Random Distribution Range

• There is a relationship between initial weights and the number of inputs the first layer receives.
• $y = 1 / \sqrt{n}$ where $n$ is the number of input
• Random distribution should be between $-y$ nd $y$
• This distribution leads to the model starting with a much lower loss.

## Normal Distribution

• Generally, taking random values from a normal distribution will yield better results than a random distribution.
• Normal dist mean should be 0
• And its range is $y$