PyTorch is an open source tool from Facebook to build DL models.


A tensor is a matrix of any dimensions. They are the base data structures of neural network frameworks. Understanding them is a prereq to using any DL framework.

Moving away from rows and columns

Once you start looking at tensors of greater than 2 dimenisons, the row x column mental model falls apart. Better to think in terms of dimensions.

For example, one of the exercises requires reshaping a tensor of size [64, 1, 28, 28] into [64, 784]. This is accomplished by dropping the 2nd dimension via squeeze and then flattening dimensions 3 and 4 via flatten.

Neural Nets in PyTorch

PyTorch exposes the nn module that makes building neural nets much more convenient. It provides a library of tools that provide the functionality we’ve been hardcoding up until this point.

  • The nn module allows you to define a neural net’s architecture, and it pretty much handles everything else.
  • Each tensor is capable of tracking changes that occur to it, which can be used to easily perform gradient descent.
  • Optimizers are used to update tensors with improvements based on loss/grad descent
  • Common training pattern in PyTorch:
    • For each batch of training data:
      1. Clear optimizer’s gradients
      2. Forward pass on model
      3. Calculate loss
      4. Backwards pass from loss
      5. Update weights with optimizer
  • Allows you to train against validation set, as well.

New Vocab

  • criterion: The network’s loss
  • logits: The raw output of the network

The float caveat

Computers aren’t good at storing floats. They must store data in base 2 (binary) format. For some decimal fractions, like $0.1$, that is impossible. For other fractions that extend to infinity when converted to a decimal, the computer will only ever be able to store an approximation of its value. For example, $1/3$ is $0.33333$ with an infinite number of 3s. Since you can’t store an infinite number of values, the final calculated value will just be an approximation of the fraction, and any subsequent calculations performed with the float will be impacted by that approximation.


Some strategies to prevent overfitting:

  • Early stopping. Stop training the model when the validation loss is at a minimum.
  • Dropout. Randomly drop inputs to force weights to compensate, which creates a more general solution.