Deep LearningVAEGenerative AIMachine LearningNeural Networks

Demystifying Variational Autoencoders: Understanding the Inner Workings of VAEs

Ash Ganda • March 9, 2021 • 12 min read

Demystifying Variational Autoencoders: Understanding the Inner Workings of VAEs Ash Ganda Sep 20, 2024 3 min read Variational Autoencoders (VAEs) have gained significant attention and popularity in the field of deep learning in recent years. But what exactly are VAEs and how do they work? In this article, we will delve deep into understanding Variational Autoencoders and their differences from traditional autoencoders. Understanding Autoencoders An autoencoder is a type of neural network that consists of two components: an encoder and a decoder. The encoder takes an input image and generates a lower dimensional representation of it. The decoder then takes this representation and reconstructs the original image from it. The goal of training an autoencoder is to minimize the reconstruction loss, which can be measured by the L2 distance between the pixels of the original and reconstructed images. Limitations of Traditional Autoencoders Traditional autoencoders have some limitations. Firstly, they are not able to guarantee that all points in the latent space will generate valid images. This means that when we zoom into the latent space, we might find regions where no valid images can be generated. Secondly, traditional autoencoders cannot generate new variations of data since they have no knowledge about the distribution of points in the latent space. Variational Autoencoders: Overcoming Limitations To overcome these limitations, Variational Autoencoders (VAEs) were developed. The main difference between traditional autoencoders and VAEs lies in how they handle encoding and decoding. While traditional autoencoders encode images to a single point in the latent space, VAEs encode each image as a distribution of points in the latent space. This allows for more flexibility and variability in generating new data. Loss Function in VAEs VAEs use a loss function that consists of two terms: the reconstruction loss and the KL divergence. The reconstruction loss remains the same as in traditional autoencoders, but now we also consider the KL divergence as a measure of how similar the distributions of the latent points are to a standard Gaussian distribution. This ensures that the distributions are not collapsed to a single point and also prevents them from being too far apart from each other. Probabilistic Modeling in VAEs VAEs can also be understood as a probabilistic model of data. We assume that our data is generated by hidden variables in the latent space. To generate new data points, we first sample from the latent space using a prior and then generate the data point using a Gaussian distribution with mean and covariance matrix approximated by our decoder. The key idea here is to learn how to infer characteristics of the hidden variables given an input image. Architecture of VAEs A standard VAE architecture consists of an encoder network, a decoder network, and a reparameterization layer. The encoder network takes an input image and generates two parameters: mean and log-variance of the distribution of latent points. These parameters are then passed through the reparameterization layer, which samples from the distribution to generate latent points. The decoder network takes these points as input and reconstructs the original image. Source: Wikipedia Commons Conditional Variational Autoencoders (CVAEs) Conditional Variational Autoencoders (CVAEs) allow for conditional generation of data by incorporating additional information into the encoding process. This extra information can be in the form of class labels, attributes, or even another input image. By conditioning on this information, CVAEs can generate more specific and targeted variations of data. Conclusion: Variational Autoencoders Demystified Variational Autoencoders have become a powerful and popular tool in the field of deep learning. They address the limitations of traditional autoencoders and allow for more flexibility and variability in data generation. Their architecture and loss function may seem complex at first, but as we have seen, it is based on simple and intuitive concepts. With further advancements in this field, VAEs are sure to play a crucial role in future machine learning applications. I hope that this blog post helped demystify the workings behind Variational Autoencoders. Tags: VAE Encoders Decoders Computer Vision

Need to put cloud strategy into action? Cloud Geeks provides in-depth guides on AWS, Azure, and cybersecurity for Australian SMBs.

These insights are drawn from my work leading Ganda Tech Services — helping Australian businesses navigate digital transformation through cloud, web, and mobile.

About the Author

Ashish Ganda is the founder of Ganda Tech Services, a Sydney-based technology consultancy specialising in cloud infrastructure, web development, and mobile app solutions for Australian businesses.

Free Guide · 2026

AI Strategy Primer for Australian Business Leaders

A practical framework for AI adoption in 2026 — cut through the hype and start with what matters.

AI Strategy Primer for Australian Business Leaders

Related Posts