Generative Adversarial Networks (GANs) - Deep Learning Framework

Abstract

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game.

In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

1. Introduction

The promise of deep learning is to discover rich, hierarchical models that represent probability distributions over the kinds of data encountered in artificial intelligence applications, such as natural images, audio waveforms containing speech, and symbols in natural language corpora. So far, the most striking successes in deep learning have involved discriminative models, usually those that map a high-dimensional, rich sensory input to a class label. These striking successes have primarily been based on the backpropagation and dropout algorithms, using piecewise linear units which have a particularly well-behaved gradient.

Deep generative models have had less of an impact, due to the difficulty of approximating many intractable probabilistic computations that arise in maximum likelihood estimation and related strategies, and due to difficulty of leveraging the benefits of piecewise linear units in the generative context. We propose a new generative model estimation procedure that sidesteps these difficulties.

The Counterfeiter-Police Analogy

In the proposed adversarial nets framework, the generative model is pitted against an adversary: a discriminative model that learns to determine whether a sample is from the model distribution or the data distribution. The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their methods until the counterfeits are indistinguishable from the genuine articles.

This framework can yield specific training algorithms for many kinds of model and optimization algorithm. In this article, we explore the special case when the generative model generates samples by passing random noise through a multilayer perceptron, and the discriminative model is also a multilayer perceptron. We refer to this special case as adversarial nets. In this case, we can train both models using only the highly successful backpropagation and dropout algorithms and sample from the generative model using only forward propagation. No approximate inference or Markov chains are necessary.

2. Related Work

An alternative to directed graphical models with latent variables are undirected graphical models with latent variables, such as restricted Boltzmann machines (RBMs), deep Boltzmann machines (DBMs) and their numerous variants. The interactions within such models are represented as the product of unnormalized potential functions, normalized by a global summation/integration over all states of the random variables.

This quantity (the partition function) and its gradient are intractable for all but the most trivial instances, although they can be estimated by Markov chain Monte Carlo (MCMC) methods. Mixing poses a significant problem for learning algorithms that rely on MCMC.

Deep Belief Networks (DBNs)

Deep belief networks (DBNs) are hybrid models containing a single undirected layer and several directed layers. While a fast approximate layer-wise training criterion exists, DBNs incur the computational difficulties associated with both undirected and directed models.

Alternative Criteria

Alternative criteria that do not approximate or bound the log-likelihood have also been proposed, such as score matching and noise-contrastive estimation. These methods require the learned probability density to be analytically specified up to a normalization constant, which is not feasible for models with numerous layers of latent variables.

Previous approaches to generative modeling using deep neural networks have typically focused on models that provide an explicit parametric specification of the probability distribution function. These models typically require the computation of intractable partition functions, which can be approximated by various methods including noise contrastive estimation and minimum probability flow.

3. Methodology

3.1 Adversarial Nets Framework

The adversarial modeling framework is most straightforward to apply when the models are both multilayer perceptrons. To learn the generator's distribution p_g over data x, we define a prior on input noise variables p_z(z), then represent a mapping to data space as G(z; θ_g), where G is a differentiable function represented by a multilayer perceptron with parameters θ_g.

We also define a second multilayer perceptron D(x; θ_d) that outputs a single scalar. D(x) represents the probability that x came from the data rather than p_g. We train D to maximize the probability of assigning the correct label to both training examples and samples from G. We simultaneously train G to minimize log(1 − D(G(z))).

3.2 Theoretical Foundation

The adversarial nets framework corresponds to a minimax game with value function V(G, D):

min_G max_D V(D, G) = E_{x∼p_data(x)}[log D(x)] + E_{z∼p_z(z)}[log(1 − D(G(z)))]

In the case where G and D have enough capacity, the training criterion allows one to recover the data generating distribution. The theoretical analysis shows that this minimax game has a global optimum for p_g = p_data, and that under mild conditions, G and D converge to this optimum.

3.3 Practical Implementation

In practice, we must implement the game using an iterative, numerical approach. Optimizing D to completion in the inner loop of training is computationally prohibitive and on finite datasets would result in overfitting. Instead, we alternate between k steps of optimizing D and one step of optimizing G. This results in D being maintained near its optimal solution, as long as G changes slowly enough.

The training procedure for D is to maximize the probability of assigning the correct label to both training examples and fake samples from G. We can simultaneously update G by minimizing log(1 − D(G(z))) or by maximizing log D(G(z))). The latter objective provides much stronger gradients early in learning when D can easily reject samples from G because they are clearly different from the training data.

4. Experimental Results

Model Performance

High Quality

Generated samples demonstrate remarkable visual quality and diversity

Training Efficiency

No MCMC

Training requires no Markov chains during learning or sampling

Architecture

MLP-based

Both generator and discriminator implemented as multilayer perceptrons

4.1 Datasets and Evaluation

We trained adversarial nets on a range of datasets including MNIST, the Toronto Face Database (TFD), and CIFAR-10. The generative models were able to produce convincing samples from all of these datasets. Quantitative results were obtained by applying the Gaussian Parzen window-based log-likelihood estimation method to generated samples.

4.2 Sample Quality

The generated samples show that adversarial nets are able to generate sharp, visually appealing samples. The samples demonstrate that the model has learned relevant features and can generate novel examples that capture the essence of the training data distribution.

4.3 Advantages Over Previous Methods

Compared to previous generative modeling approaches, adversarial nets offer several advantages:

No need for approximate inference or Markov chains
Very few restrictions on model architecture
Can represent very sharp, even degenerate distributions
Training does not require co-operation between generator and discriminator networks
Can be trained with pure backpropagation

5. Discussion

5.1 Advantages and Disadvantages

Advantages:

There is no need for any Markov chains
No inference is required during learning
A wide variety of functions can be incorporated into the model
The generator network is not updated directly with data examples, but only with gradients flowing through the discriminator

Disadvantages:

There is no explicit representation of p_g(x)
D must be synchronized well with G during training
The training dynamics can be unstable

5.2 Future Research Directions

This framework admits many straightforward extensions:

A conditional generative model p(x|c) can be obtained by adding c as input to both G and D
Learned approximate inference can be performed by training an auxiliary network to predict z given x
One can approximately model all conditionals p(x_S | x_{not S}) where S is a subset of the indices of x
Half-way supervised learning: features from the discriminator or inference network could improve performance of classifiers when limited labeled data is available
Efficiency improvements: better methods of coordinating G and D or determining better distributions to sample z from during training

6. Conclusion

This work introduces the adversarial modeling framework for estimating generative models via an adversarial process. The framework enables the training of generative models that were previously difficult to train due to the difficulty of approximating many intractable probabilistic computations. When the generative model is defined by a multilayer perceptron, the entire system can be trained with backpropagation.

The key advantage of this approach is that neither Markov chains nor approximate inference networks are required during training or generation of samples. Experimental results demonstrate the potential of this framework through qualitative and quantitative evaluation of generated samples.

The adversarial modeling framework opens up many directions for future work, including applications to semi-supervised learning, efficiency improvements, and theoretical analysis of the convergence properties of the training procedure.

Key Insights

Novel Training Paradigm

The adversarial training process represents a fundamentally new approach to training generative models, framing the problem as a two-player minimax game between generator and discriminator networks.

Elimination of MCMC

Unlike previous generative modeling approaches, adversarial nets require no Markov chain Monte Carlo methods during training or sampling, significantly improving computational efficiency.

Backpropagation Compatibility

The entire adversarial system can be trained using standard backpropagation algorithms, making it compatible with existing deep learning toolchains and infrastructure.

Theoretical Guarantees

Under appropriate conditions, the adversarial training process is guaranteed to converge to the optimal solution where the generator perfectly replicates the data distribution.

Flexible Architecture

The framework imposes minimal restrictions on model architecture, allowing for the use of various neural network designs for both generator and discriminator components.

Practical Applications

The adversarial framework has broad applicability across domains including image generation, semi-supervised learning, and representation learning.