# Weight Initialization

Training a network is the task of finding the correct weights and biases for a set of neurons. When using a pretrained network, we start with the weights provided. What about when training from scratch?

## Two Strategies

### Constants

- 0s
- Multiplying weights and inputs will lead to 0

- 1s
- Multiplying weights and inputs will lead to same input values

- Any constant values with weights will lead to problems during back prop

### Random

- Pick weights from a random distribution
- Back prop makes decisions based on variety, so differences between random numbers will give it the information it needs to succeed.

### There *needs* to be an element of randomness

## Random Distribution Range

- There is a relationship between initial weights and the number of inputs the first layer receives.
- $y = 1 / \sqrt{n}$ where $n$ is the number of input
- Random distribution should be between $-y$ nd $y$
- This distribution leads to the model starting with a much lower loss.

## Normal Distribution

- Generally, taking random values from a normal distribution will yield better results than a random distribution.
- Normal dist mean should be 0
- And its range is $y$