fbpx
Wikipedia

Variational autoencoder

In machine learning, a variational autoencoder (VAE) is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling. It is part of the families of probabilistic graphical models and variational Bayesian methods.[1]

The basic scheme of a variational autoencoder. The model receives as input. The encoder compresses it into the latent space. The decoder receives as input the information sampled from the latent space and produces as similar as possible to .

In addition to being seen as an autoencoder neural network architecture, variational autoencoders can also be studied within the mathematical formulation of variational Bayesian methods, connecting a neural encoder network to its decoder through a probabilistic latent space (for example, as a multivariate Gaussian distribution) that corresponds to the parameters of a variational distribution.

Thus, the encoder maps each point (such as an image) from a large complex dataset into a distribution within the latent space, rather than to a single point in that space. The decoder has the opposite function, which is to map from the latent space to the input space, again according to a distribution (although in practice, noise rarely added during the decoding stage). By mapping a point to a distribution instead of a single point, the network can avoid overfitting the training data.[2] Both networks are typically trained together with the usage of the reparameterization trick, although the variance of the noise model can be learned separately.

Although this type of model was initially designed for unsupervised learning,[3][4] its effectiveness has been proven for semi-supervised learning[5][6] and supervised learning.[7]

Overview of architecture and operation edit

A variational autoencoder is a generative model with a prior and noise distribution respectively. Usually such models are trained using the expectation-maximization meta-algorithm (e.g. probabilistic PCA, (spike & slab) sparse coding). Such a scheme optimizes a lower bound of the data likelihood, which is usually intractable, and in doing so requires the discovery of q-distributions, or variational posteriors. These q-distributions are normally parameterized for each individual data point in a separate optimization process. However, variational autoencoders use a neural network as an amortized approach to jointly optimize across data points. This neural network takes as input the data points themselves, and outputs parameters for the variational distribution. As it maps from a known input space to the low-dimensional latent space, it is called the encoder.

The decoder is the second neural network of this model. It is a function that maps from the latent space to the input space, e.g. as the means of the noise distribution. It is possible to use another neural network that maps to the variance, however this can be omitted for simplicity. In such a case, the variance can be optimized with gradient descent.

To optimize this model, one needs to know two terms: the "reconstruction error", and the Kullback–Leibler divergence (KL-D). Both terms are derived from the free energy expression of the probabilistic model, and therefore differ depending on the noise distribution and the assumed prior of the data. For example, a standard VAE task such as IMAGENET is typically assumed to have a gaussianly distributed noise; however, tasks such as binarized MNIST require a Bernoulli noise. The KL-D from the free energy expression maximizes the probability mass of the q-distribution that overlaps with the p-distribution, which unfortunately can result in mode-seeking behaviour. The "reconstruction" term is the remainder of the free energy expression, and requires a sampling approximation to compute its expectation value.[8]

Formulation edit

From the point of view of probabilistic modeling, one wants to maximize the likelihood of the data   by their chosen parameterized probability distribution  . This distribution is usually chosen to be a Gaussian   which is parameterized by   and   respectively, and as a member of the exponential family it is easy to work with as a noise distribution. Simple distributions are easy enough to maximize, however distributions where a prior is assumed over the latents   results in intractable integrals. Let us find   via marginalizing over  .

 

where   represents the joint distribution under   of the observable data   and its latent representation or encoding  . According to the chain rule, the equation can be rewritten as

 

In the vanilla variational autoencoder,   is usually taken to be a finite-dimensional vector of real numbers, and   to be a Gaussian distribution. Then   is a mixture of Gaussian distributions.

It is now possible to define the set of the relationships between the input data and its latent representation as

  • Prior  
  • Likelihood  
  • Posterior  

Unfortunately, the computation of   is expensive and in most cases intractable. To speed up the calculus to make it feasible, it is necessary to introduce a further function to approximate the posterior distribution as

 

with   defined as the set of real values that parametrize  . This is sometimes called amortized inference, since by "investing" in finding a good  , one can later infer   from   quickly without doing any integrals.

In this way, the problem is to find a good probabilistic autoencoder, in which the conditional likelihood distribution   is computed by the probabilistic decoder, and the approximated posterior distribution   is computed by the probabilistic encoder.

Parametrize the encoder as  , and the decoder as  .

Evidence lower bound (ELBO) edit

As in every deep learning problem, it is necessary to define a differentiable loss function in order to update the network weights through backpropagation.

For variational autoencoders, the idea is to jointly optimize the generative model parameters   to reduce the reconstruction error between the input and the output, and   to make   as close as possible to  . As reconstruction loss, mean squared error and cross entropy are often used.

As distance loss between the two distributions the Kullback–Leibler divergence   is a good choice to squeeze   under  .[8][9]

The distance loss just defined is expanded as

 

Now define the evidence lower bound (ELBO):

 
Maximizing the ELBO
 
is equivalent to simultaneously maximizing   and minimizing  . That is, maximizing the log-likelihood of the observed data, and minimizing the divergence of the approximate posterior   from the exact posterior  .

The form given is not very convenient for maximization, but the following, equivalent form, is:

 
where   is implemented as  , since that is, up to an additive constant, what   yields. That is, we model the distribution of   conditional on   to be a Gaussian distribution centered on  . The distribution of   and   are often also chosen to be Gaussians as   and  , with which we obtain by the formula for KL divergence of Gaussians:
 
Here   is the dimension of  . For a more detailed derivation and more interpretations of ELBO and its maximization, see its main page.

Reparameterization edit

 
The scheme of the reparameterization trick. The randomness variable   is injected into the latent space   as external input. In this way, it is possible to backpropagate the gradient without involving stochastic variable during the update.

To efficiently search for

 
the typical method is gradient descent.

It is straightforward to find

 
However,
 
does not allow one to put the   inside the expectation, since   appears in the probability distribution itself. The reparameterization trick (also known as stochastic backpropagation[10]) bypasses this difficulty.[8][11][12]

The most important example is when   is normally distributed, as  .

 
The scheme of a variational autoencoder after the reparameterization trick

This can be reparametrized by letting   be a "standard random number generator", and construct   as  . Here,   is obtained by the Cholesky decomposition:

 
Then we have
 
and so we obtained an unbiased estimator of the gradient, allowing stochastic gradient descent.

Since we reparametrized  , we need to find  . Let   be the probability density function for  , then [clarification needed]

 
where   is the Jacobian matrix of   with respect to  . Since  , this is
 

Variations edit

Many variational autoencoders applications and extensions have been used to adapt the architecture to other domains and improve its performance.

 -VAE is an implementation with a weighted Kullback–Leibler divergence term to automatically discover and interpret factorised latent representations. With this implementation, it is possible to force manifold disentanglement for   values greater than one. This architecture can discover disentangled latent factors without supervision.[13][14]

The conditional VAE (CVAE), inserts label information in the latent space to force a deterministic constrained representation of the learned data.[15]

Some structures directly deal with the quality of the generated samples[16][17] or implement more than one latent space to further improve the representation learning.

Some architectures mix VAE and generative adversarial networks to obtain hybrid models.[18][19][20]

See also edit

References edit

  1. ^ Pinheiro Cinelli, Lucas; et al. (2021). "Variational Autoencoder". Variational Methods for Machine Learning with Applications to Deep Networks. Springer. pp. 111–149. doi:10.1007/978-3-030-70679-1_5. ISBN 978-3-030-70681-4. S2CID 240802776.
  2. ^ Rocca, Joseph (2021-03-21). "Understanding Variational Autoencoders (VAEs)". Medium.
  3. ^ Dilokthanakul, Nat; Mediano, Pedro A. M.; Garnelo, Marta; Lee, Matthew C. H.; Salimbeni, Hugh; Arulkumaran, Kai; Shanahan, Murray (2017-01-13). "Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders". arXiv:1611.02648 [cs.LG].
  4. ^ Hsu, Wei-Ning; Zhang, Yu; Glass, James (December 2017). "Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation". 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). pp. 16–23. arXiv:1707.06265. doi:10.1109/ASRU.2017.8268911. ISBN 978-1-5090-4788-8. S2CID 22681625.
  5. ^ Ehsan Abbasnejad, M.; Dick, Anthony; van den Hengel, Anton (2017). Infinite Variational Autoencoder for Semi-Supervised Learning. pp. 5888–5897.
  6. ^ Xu, Weidi; Sun, Haoze; Deng, Chao; Tan, Ying (2017-02-12). "Variational Autoencoder for Semi-Supervised Text Classification". Proceedings of the AAAI Conference on Artificial Intelligence. 31 (1). doi:10.1609/aaai.v31i1.10966. S2CID 2060721.
  7. ^ Kameoka, Hirokazu; Li, Li; Inoue, Shota; Makino, Shoji (2019-09-01). "Supervised Determined Source Separation with Multichannel Variational Autoencoder". Neural Computation. 31 (9): 1891–1914. doi:10.1162/neco_a_01217. PMID 31335290. S2CID 198168155.
  8. ^ a b c Kingma, Diederik P.; Welling, Max (2013-12-20). "Auto-Encoding Variational Bayes". arXiv:1312.6114 [stat.ML].
  9. ^ "From Autoencoder to Beta-VAE". Lil'Log. 2018-08-12.
  10. ^ Rezende, Danilo Jimenez; Mohamed, Shakir; Wierstra, Daan (2014-06-18). "Stochastic Backpropagation and Approximate Inference in Deep Generative Models". International Conference on Machine Learning. PMLR: 1278–1286. arXiv:1401.4082.
  11. ^ Bengio, Yoshua; Courville, Aaron; Vincent, Pascal (2013). "Representation Learning: A Review and New Perspectives". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (8): 1798–1828. arXiv:1206.5538. doi:10.1109/TPAMI.2013.50. ISSN 1939-3539. PMID 23787338. S2CID 393948.
  12. ^ Kingma, Diederik P.; Rezende, Danilo J.; Mohamed, Shakir; Welling, Max (2014-10-31). "Semi-Supervised Learning with Deep Generative Models". arXiv:1406.5298 [cs.LG].
  13. ^ Higgins, Irina; Matthey, Loic; Pal, Arka; Burgess, Christopher; Glorot, Xavier; Botvinick, Matthew; Mohamed, Shakir; Lerchner, Alexander (2016-11-04). "beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework". {{cite journal}}: Cite journal requires |journal= (help)
  14. ^ Burgess, Christopher P.; Higgins, Irina; Pal, Arka; Matthey, Loic; Watters, Nick; Desjardins, Guillaume; Lerchner, Alexander (2018-04-10). "Understanding disentangling in β-VAE". arXiv:1804.03599 [stat.ML].
  15. ^ Sohn, Kihyuk; Lee, Honglak; Yan, Xinchen (2015-01-01). "Learning Structured Output Representation using Deep Conditional Generative Models" (PDF). {{cite journal}}: Cite journal requires |journal= (help)
  16. ^ Dai, Bin; Wipf, David (2019-10-30). "Diagnosing and Enhancing VAE Models". arXiv:1903.05789 [cs.LG].
  17. ^ Dorta, Garoe; Vicente, Sara; Agapito, Lourdes; Campbell, Neill D. F.; Simpson, Ivor (2018-07-31). "Training VAEs Under Structured Residuals". arXiv:1804.01050 [stat.ML].
  18. ^ Larsen, Anders Boesen Lindbo; Sønderby, Søren Kaae; Larochelle, Hugo; Winther, Ole (2016-06-11). "Autoencoding beyond pixels using a learned similarity metric". International Conference on Machine Learning. PMLR: 1558–1566. arXiv:1512.09300.
  19. ^ Bao, Jianmin; Chen, Dong; Wen, Fang; Li, Houqiang; Hua, Gang (2017). "CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training". pp. 2745–2754. arXiv:1703.10155 [cs.CV].
  20. ^ Gao, Rui; Hou, Xingsong; Qin, Jie; Chen, Jiaxin; Liu, Li; Zhu, Fan; Zhang, Zhao; Shao, Ling (2020). "Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning". IEEE Transactions on Image Processing. 29: 3665–3680. Bibcode:2020ITIP...29.3665G. doi:10.1109/TIP.2020.2964429. ISSN 1941-0042. PMID 31940538. S2CID 210334032.

variational, autoencoder, machine, learning, variational, autoencoder, artificial, neural, network, architecture, introduced, diederik, kingma, welling, part, families, probabilistic, graphical, models, variational, bayesian, methods, basic, scheme, variationa. In machine learning a variational autoencoder VAE is an artificial neural network architecture introduced by Diederik P Kingma and Max Welling It is part of the families of probabilistic graphical models and variational Bayesian methods 1 The basic scheme of a variational autoencoder The model receives x displaystyle x as input The encoder compresses it into the latent space The decoder receives as input the information sampled from the latent space and produces x displaystyle x as similar as possible to x displaystyle x In addition to being seen as an autoencoder neural network architecture variational autoencoders can also be studied within the mathematical formulation of variational Bayesian methods connecting a neural encoder network to its decoder through a probabilistic latent space for example as a multivariate Gaussian distribution that corresponds to the parameters of a variational distribution Thus the encoder maps each point such as an image from a large complex dataset into a distribution within the latent space rather than to a single point in that space The decoder has the opposite function which is to map from the latent space to the input space again according to a distribution although in practice noise rarely added during the decoding stage By mapping a point to a distribution instead of a single point the network can avoid overfitting the training data 2 Both networks are typically trained together with the usage of the reparameterization trick although the variance of the noise model can be learned separately Although this type of model was initially designed for unsupervised learning 3 4 its effectiveness has been proven for semi supervised learning 5 6 and supervised learning 7 Contents 1 Overview of architecture and operation 2 Formulation 3 Evidence lower bound ELBO 4 Reparameterization 5 Variations 6 See also 7 ReferencesOverview of architecture and operation editA variational autoencoder is a generative model with a prior and noise distribution respectively Usually such models are trained using the expectation maximization meta algorithm e g probabilistic PCA spike amp slab sparse coding Such a scheme optimizes a lower bound of the data likelihood which is usually intractable and in doing so requires the discovery of q distributions or variational posteriors These q distributions are normally parameterized for each individual data point in a separate optimization process However variational autoencoders use a neural network as an amortized approach to jointly optimize across data points This neural network takes as input the data points themselves and outputs parameters for the variational distribution As it maps from a known input space to the low dimensional latent space it is called the encoder The decoder is the second neural network of this model It is a function that maps from the latent space to the input space e g as the means of the noise distribution It is possible to use another neural network that maps to the variance however this can be omitted for simplicity In such a case the variance can be optimized with gradient descent To optimize this model one needs to know two terms the reconstruction error and the Kullback Leibler divergence KL D Both terms are derived from the free energy expression of the probabilistic model and therefore differ depending on the noise distribution and the assumed prior of the data For example a standard VAE task such as IMAGENET is typically assumed to have a gaussianly distributed noise however tasks such as binarized MNIST require a Bernoulli noise The KL D from the free energy expression maximizes the probability mass of the q distribution that overlaps with the p distribution which unfortunately can result in mode seeking behaviour The reconstruction term is the remainder of the free energy expression and requires a sampling approximation to compute its expectation value 8 Formulation editFrom the point of view of probabilistic modeling one wants to maximize the likelihood of the data x displaystyle x nbsp by their chosen parameterized probability distribution p 8 x p x 8 displaystyle p theta x p x theta nbsp This distribution is usually chosen to be a Gaussian N x m s displaystyle N x mu sigma nbsp which is parameterized by m displaystyle mu nbsp and s displaystyle sigma nbsp respectively and as a member of the exponential family it is easy to work with as a noise distribution Simple distributions are easy enough to maximize however distributions where a prior is assumed over the latents z displaystyle z nbsp results in intractable integrals Let us find p 8 x displaystyle p theta x nbsp via marginalizing over z displaystyle z nbsp p 8 x z p 8 x z d z displaystyle p theta x int z p theta x z dz nbsp where p 8 x z displaystyle p theta x z nbsp represents the joint distribution under p 8 displaystyle p theta nbsp of the observable data x displaystyle x nbsp and its latent representation or encoding z displaystyle z nbsp According to the chain rule the equation can be rewritten as p 8 x z p 8 x z p 8 z d z displaystyle p theta x int z p theta x z p theta z dz nbsp In the vanilla variational autoencoder z displaystyle z nbsp is usually taken to be a finite dimensional vector of real numbers and p 8 x z displaystyle p theta x z nbsp to be a Gaussian distribution Then p 8 x displaystyle p theta x nbsp is a mixture of Gaussian distributions It is now possible to define the set of the relationships between the input data and its latent representation as Prior p 8 z displaystyle p theta z nbsp Likelihood p 8 x z displaystyle p theta x z nbsp Posterior p 8 z x displaystyle p theta z x nbsp Unfortunately the computation of p 8 z x displaystyle p theta z x nbsp is expensive and in most cases intractable To speed up the calculus to make it feasible it is necessary to introduce a further function to approximate the posterior distribution as q ϕ z x p 8 z x displaystyle q phi z x approx p theta z x nbsp with ϕ displaystyle phi nbsp defined as the set of real values that parametrize q displaystyle q nbsp This is sometimes called amortized inference since by investing in finding a good q ϕ displaystyle q phi nbsp one can later infer z displaystyle z nbsp from x displaystyle x nbsp quickly without doing any integrals In this way the problem is to find a good probabilistic autoencoder in which the conditional likelihood distribution p 8 x z displaystyle p theta x z nbsp is computed by the probabilistic decoder and the approximated posterior distribution q ϕ z x displaystyle q phi z x nbsp is computed by the probabilistic encoder Parametrize the encoder as E ϕ displaystyle E phi nbsp and the decoder as D 8 displaystyle D theta nbsp Evidence lower bound ELBO editMain article Evidence lower bound As in every deep learning problem it is necessary to define a differentiable loss function in order to update the network weights through backpropagation For variational autoencoders the idea is to jointly optimize the generative model parameters 8 displaystyle theta nbsp to reduce the reconstruction error between the input and the output and ϕ displaystyle phi nbsp to make q ϕ z x displaystyle q phi z x nbsp as close as possible to p 8 z x displaystyle p theta z x nbsp As reconstruction loss mean squared error and cross entropy are often used As distance loss between the two distributions the Kullback Leibler divergence D K L q ϕ z x p 8 z x displaystyle D KL q phi z x parallel p theta z x nbsp is a good choice to squeeze q ϕ z x displaystyle q phi z x nbsp under p 8 z x displaystyle p theta z x nbsp 8 9 The distance loss just defined is expanded as D K L q ϕ z x p 8 z x E z q ϕ x ln q ϕ z x p 8 z x E z q ϕ x ln q ϕ z x p 8 x p 8 x z ln p 8 x E z q ϕ x ln q ϕ z x p 8 x z displaystyle begin aligned D KL q phi z x parallel p theta z x amp mathbb E z sim q phi cdot x left ln frac q phi z x p theta z x right amp mathbb E z sim q phi cdot x left ln frac q phi z x p theta x p theta x z right amp ln p theta x mathbb E z sim q phi cdot x left ln frac q phi z x p theta x z right end aligned nbsp Now define the evidence lower bound ELBO L 8 ϕ x E z q ϕ x ln p 8 x z q ϕ z x ln p 8 x D K L q ϕ x p 8 x displaystyle L theta phi x mathbb E z sim q phi cdot x left ln frac p theta x z q phi z x right ln p theta x D KL q phi cdot x parallel p theta cdot x nbsp Maximizing the ELBO8 ϕ argmax 8 ϕ L 8 ϕ x displaystyle theta phi underset theta phi operatorname argmax L theta phi x nbsp is equivalent to simultaneously maximizing ln p 8 x displaystyle ln p theta x nbsp and minimizing D K L q ϕ z x p 8 z x displaystyle D KL q phi z x parallel p theta z x nbsp That is maximizing the log likelihood of the observed data and minimizing the divergence of the approximate posterior q ϕ x displaystyle q phi cdot x nbsp from the exact posterior p 8 x displaystyle p theta cdot x nbsp The form given is not very convenient for maximization but the following equivalent form is L 8 ϕ x E z q ϕ x ln p 8 x z D K L q ϕ x p 8 displaystyle L theta phi x mathbb E z sim q phi cdot x left ln p theta x z right D KL q phi cdot x parallel p theta cdot nbsp where ln p 8 x z displaystyle ln p theta x z nbsp is implemented as 1 2 x D 8 z 2 2 displaystyle frac 1 2 x D theta z 2 2 nbsp since that is up to an additive constant what x N D 8 z I displaystyle x sim mathcal N D theta z I nbsp yields That is we model the distribution of x displaystyle x nbsp conditional on z displaystyle z nbsp to be a Gaussian distribution centered on D 8 z displaystyle D theta z nbsp The distribution of q ϕ z x displaystyle q phi z x nbsp and p 8 z displaystyle p theta z nbsp are often also chosen to be Gaussians as z x N E ϕ x s ϕ x 2 I displaystyle z x sim mathcal N E phi x sigma phi x 2 I nbsp and z N 0 I displaystyle z sim mathcal N 0 I nbsp with which we obtain by the formula for KL divergence of Gaussians L 8 ϕ x 1 2 E z q ϕ x x D 8 z 2 2 1 2 N s ϕ x 2 E ϕ x 2 2 2 N ln s ϕ x C o n s t displaystyle L theta phi x frac 1 2 mathbb E z sim q phi cdot x left x D theta z 2 2 right frac 1 2 left N sigma phi x 2 E phi x 2 2 2N ln sigma phi x right Const nbsp Here N displaystyle N nbsp is the dimension of z displaystyle z nbsp For a more detailed derivation and more interpretations of ELBO and its maximization see its main page Reparameterization edit nbsp The scheme of the reparameterization trick The randomness variable e displaystyle varepsilon nbsp is injected into the latent space z displaystyle z nbsp as external input In this way it is possible to backpropagate the gradient without involving stochastic variable during the update To efficiently search for8 ϕ argmax 8 ϕ L 8 ϕ x displaystyle theta phi underset theta phi operatorname argmax L theta phi x nbsp the typical method is gradient descent It is straightforward to find 8 E z q ϕ x ln p 8 x z q ϕ z x E z q ϕ x 8 ln p 8 x z q ϕ z x displaystyle nabla theta mathbb E z sim q phi cdot x left ln frac p theta x z q phi z x right mathbb E z sim q phi cdot x left nabla theta ln frac p theta x z q phi z x right nbsp However ϕ E z q ϕ x ln p 8 x z q ϕ z x displaystyle nabla phi mathbb E z sim q phi cdot x left ln frac p theta x z q phi z x right nbsp does not allow one to put the ϕ displaystyle nabla phi nbsp inside the expectation since ϕ displaystyle phi nbsp appears in the probability distribution itself The reparameterization trick also known as stochastic backpropagation 10 bypasses this difficulty 8 11 12 The most important example is when z q ϕ x displaystyle z sim q phi cdot x nbsp is normally distributed as N m ϕ x S ϕ x displaystyle mathcal N mu phi x Sigma phi x nbsp nbsp The scheme of a variational autoencoder after the reparameterization trick This can be reparametrized by letting e N 0 I displaystyle boldsymbol varepsilon sim mathcal N 0 boldsymbol I nbsp be a standard random number generator and construct z displaystyle z nbsp as z m ϕ x L ϕ x ϵ displaystyle z mu phi x L phi x epsilon nbsp Here L ϕ x displaystyle L phi x nbsp is obtained by the Cholesky decomposition S ϕ x L ϕ x L ϕ x T displaystyle Sigma phi x L phi x L phi x T nbsp Then we have ϕ E z q ϕ x ln p 8 x z q ϕ z x E ϵ ϕ ln p 8 x m ϕ x L ϕ x ϵ q ϕ m ϕ x L ϕ x ϵ x displaystyle nabla phi mathbb E z sim q phi cdot x left ln frac p theta x z q phi z x right mathbb E epsilon left nabla phi ln frac p theta x mu phi x L phi x epsilon q phi mu phi x L phi x epsilon x right nbsp and so we obtained an unbiased estimator of the gradient allowing stochastic gradient descent Since we reparametrized z displaystyle z nbsp we need to find q ϕ z x displaystyle q phi z x nbsp Let q 0 displaystyle q 0 nbsp be the probability density function for ϵ displaystyle epsilon nbsp then clarification needed ln q ϕ z x ln q 0 ϵ ln det ϵ z displaystyle ln q phi z x ln q 0 epsilon ln det partial epsilon z nbsp where ϵ z displaystyle partial epsilon z nbsp is the Jacobian matrix of ϵ displaystyle epsilon nbsp with respect to z displaystyle z nbsp Since z m ϕ x L ϕ x ϵ displaystyle z mu phi x L phi x epsilon nbsp this is ln q ϕ z x 1 2 ϵ 2 ln det L ϕ x n 2 ln 2 p displaystyle ln q phi z x frac 1 2 epsilon 2 ln det L phi x frac n 2 ln 2 pi nbsp Variations editMany variational autoencoders applications and extensions have been used to adapt the architecture to other domains and improve its performance b displaystyle beta nbsp VAE is an implementation with a weighted Kullback Leibler divergence term to automatically discover and interpret factorised latent representations With this implementation it is possible to force manifold disentanglement for b displaystyle beta nbsp values greater than one This architecture can discover disentangled latent factors without supervision 13 14 The conditional VAE CVAE inserts label information in the latent space to force a deterministic constrained representation of the learned data 15 Some structures directly deal with the quality of the generated samples 16 17 or implement more than one latent space to further improve the representation learning Some architectures mix VAE and generative adversarial networks to obtain hybrid models 18 19 20 See also editAutoencoder Artificial neural network Deep learning Generative adversarial network Representation learning Sparse dictionary learning Data augmentation BackpropagationReferences edit Pinheiro Cinelli Lucas et al 2021 Variational Autoencoder Variational Methods for Machine Learning with Applications to Deep Networks Springer pp 111 149 doi 10 1007 978 3 030 70679 1 5 ISBN 978 3 030 70681 4 S2CID 240802776 Rocca Joseph 2021 03 21 Understanding Variational Autoencoders VAEs Medium Dilokthanakul Nat Mediano Pedro A M Garnelo Marta Lee Matthew C H Salimbeni Hugh Arulkumaran Kai Shanahan Murray 2017 01 13 Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders arXiv 1611 02648 cs LG Hsu Wei Ning Zhang Yu Glass James December 2017 Unsupervised domain adaptation for robust speech recognition via variational autoencoder based data augmentation 2017 IEEE Automatic Speech Recognition and Understanding Workshop ASRU pp 16 23 arXiv 1707 06265 doi 10 1109 ASRU 2017 8268911 ISBN 978 1 5090 4788 8 S2CID 22681625 Ehsan Abbasnejad M Dick Anthony van den Hengel Anton 2017 Infinite Variational Autoencoder for Semi Supervised Learning pp 5888 5897 Xu Weidi Sun Haoze Deng Chao Tan Ying 2017 02 12 Variational Autoencoder for Semi Supervised Text Classification Proceedings of the AAAI Conference on Artificial Intelligence 31 1 doi 10 1609 aaai v31i1 10966 S2CID 2060721 Kameoka Hirokazu Li Li Inoue Shota Makino Shoji 2019 09 01 Supervised Determined Source Separation with Multichannel Variational Autoencoder Neural Computation 31 9 1891 1914 doi 10 1162 neco a 01217 PMID 31335290 S2CID 198168155 a b c Kingma Diederik P Welling Max 2013 12 20 Auto Encoding Variational Bayes arXiv 1312 6114 stat ML From Autoencoder to Beta VAE Lil Log 2018 08 12 Rezende Danilo Jimenez Mohamed Shakir Wierstra Daan 2014 06 18 Stochastic Backpropagation and Approximate Inference in Deep Generative Models International Conference on Machine Learning PMLR 1278 1286 arXiv 1401 4082 Bengio Yoshua Courville Aaron Vincent Pascal 2013 Representation Learning A Review and New Perspectives IEEE Transactions on Pattern Analysis and Machine Intelligence 35 8 1798 1828 arXiv 1206 5538 doi 10 1109 TPAMI 2013 50 ISSN 1939 3539 PMID 23787338 S2CID 393948 Kingma Diederik P Rezende Danilo J Mohamed Shakir Welling Max 2014 10 31 Semi Supervised Learning with Deep Generative Models arXiv 1406 5298 cs LG Higgins Irina Matthey Loic Pal Arka Burgess Christopher Glorot Xavier Botvinick Matthew Mohamed Shakir Lerchner Alexander 2016 11 04 beta VAE Learning Basic Visual Concepts with a Constrained Variational Framework a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Burgess Christopher P Higgins Irina Pal Arka Matthey Loic Watters Nick Desjardins Guillaume Lerchner Alexander 2018 04 10 Understanding disentangling in b VAE arXiv 1804 03599 stat ML Sohn Kihyuk Lee Honglak Yan Xinchen 2015 01 01 Learning Structured Output Representation using Deep Conditional Generative Models PDF a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Dai Bin Wipf David 2019 10 30 Diagnosing and Enhancing VAE Models arXiv 1903 05789 cs LG Dorta Garoe Vicente Sara Agapito Lourdes Campbell Neill D F Simpson Ivor 2018 07 31 Training VAEs Under Structured Residuals arXiv 1804 01050 stat ML Larsen Anders Boesen Lindbo Sonderby Soren Kaae Larochelle Hugo Winther Ole 2016 06 11 Autoencoding beyond pixels using a learned similarity metric International Conference on Machine Learning PMLR 1558 1566 arXiv 1512 09300 Bao Jianmin Chen Dong Wen Fang Li Houqiang Hua Gang 2017 CVAE GAN Fine Grained Image Generation Through Asymmetric Training pp 2745 2754 arXiv 1703 10155 cs CV Gao Rui Hou Xingsong Qin Jie Chen Jiaxin Liu Li Zhu Fan Zhang Zhao Shao Ling 2020 Zero VAE GAN Generating Unseen Features for Generalized and Transductive Zero Shot Learning IEEE Transactions on Image Processing 29 3665 3680 Bibcode 2020ITIP 29 3665G doi 10 1109 TIP 2020 2964429 ISSN 1941 0042 PMID 31940538 S2CID 210334032 Retrieved from https en wikipedia org w index php title Variational autoencoder amp oldid 1223348977, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.