- The convolutional layers in a CNN take an image as input and output data with a smaller resolution, but deeper information. The original content should be reproducible from the convolutional layer. In this regard, it can be thought of as a form of compression. Decoders can be built to recreate the encoded input with certain useful transformations, like a denoiser or color alteration.
- There are tools to aid in this encoding process called autoencoders. It is ‘auto’ because in contrast to a jpg or mp3 encoder which follow explicit, human-created rules, the rules in an autoencoder are determined, or uncovered, by a machine.
- Autoencoders can be built using linear layers, but since we’re dealing with images, we’ll probably get better results with convolutional layers.
Autoencoding with CNNs
- The encoding portion of the convolutional autoencoder is the familar convolutional layers: Downsampling input while increasing its depth.
- The decoding portion is new, but it’s the same process as doing it with a linear network: Reverse the encoding process to recreate the input dimensions.
- If the input was downsampled with maxpooling, you need to ‘unpool’ the layers via ‘upsampling’.
- You can reverse many pooling strategies. For example, to reverse using a nearest neighbor strategy, you would take a pixel then copy it to the new pixels created when upsampling it.
- Encoder: Wide, shallow -> narrow, deep
- Decoder: Narrow, deep -> wide, shallow
Transpose Convolutional Layers
- AKA deconvolutional layers
- But technically inaccurate: They don’t undo convolutional layers
- They simply upsize a given input