• The convolutional layers in a CNN take an image as input and output data with a smaller resolution, but deeper information. The original content should be reproducible from the convolutional layer. In this regard, it can be thought of as a form of compression. Decoders can be built to recreate the encoded input with certain useful transformations, like a denoiser or color alteration.
  • There are tools to aid in this encoding process called autoencoders. It is ‘auto’ because in contrast to a jpg or mp3 encoder which follow explicit, human-created rules, the rules in an autoencoder are determined, or uncovered, by a machine.
  • Autoencoders can be built using linear layers, but since we’re dealing with images, we’ll probably get better results with convolutional layers.

Autoencoding with CNNs

  • The encoding portion of the convolutional autoencoder is the familar convolutional layers: Downsampling input while increasing its depth.
  • The decoding portion is new, but it’s the same process as doing it with a linear network: Reverse the encoding process to recreate the input dimensions.
    • If the input was downsampled with maxpooling, you need to ‘unpool’ the layers via ‘upsampling’.
    • You can reverse many pooling strategies. For example, to reverse using a nearest neighbor strategy, you would take a pixel then copy it to the new pixels created when upsampling it.
  • Encoder: Wide, shallow -> narrow, deep
  • Decoder: Narrow, deep -> wide, shallow

Transpose Convolutional Layers

  • AKA deconvolutional layers
    • But technically inaccurate: They don’t undo convolutional layers
  • They simply upsize a given input