Deep generative models are at the core of research in artificial intelligence, especially for unlabelled data. They have achieved remarkable performance in domains including computer vision, natural language processing, speech recognition, and audio synthesis. Very recently, deep generative models have been applied to broader domains, e.g. fields of science including the natural sciences, physics, chemistry and molecular biology, and medicine. However, deep generative models still face challenges when applied to these domains from which arise highly structured data. This workshop aims to bring experts from different backgrounds and perspectives to discuss the applications of deep generative models to these data modalities. The workshop will put an emphasis on challenges in encoding domain knowledge when learning representations, performing synthesis, or for prediction purposes. Since evaluation is essential for benchmarking, the workshop will also be a platform for discussing rigorous ways to evaluate representations and synthesis.
Fri 6:00 a.m. - 6:10 a.m.
|
Opening Remarks
|
🔗 |
Fri 6:10 a.m. - 7:00 a.m.
|
Invited talk by Prof. Max Welling
(
Invited Talk
)
|
🔗 |
Fri 7:00 a.m. - 7:50 a.m.
|
Invited talk by Dr. Ellen Zhong
(
Invited Talks
)
|
🔗 |
Fri 8:00 a.m. - 8:15 a.m.
|
Bayesian Structure Learning with Generative Flow Networks
(
Oral
)
link »
SlidesLive Video » In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian networks, from data. Defining such a distribution is very challenging, due to the combinatorially large sample space, and approximations based on MCMC are often required. Recently, a novel class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling of discrete and composite objects, such as graphs. In this work, we propose to use a GFlowNet as an alternative to MCMC for approximating the posterior distribution over the structure of Bayesian networks, given a dataset of observations. Generating a sample DAG from this approximate distribution is viewed as a sequential decision problem, where the graph is constructed one edge at a time, based on learned transition probabilities. Through evaluation on both simulated and real data, we show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs, and it compares favorably against other methods based on MCMC or variational inference. |
Tristan Deleu · António Góis · Chris Emezue · Mansi Rankawat · Simon Lacoste-Julien · Stefan Bauer · Yoshua Bengio 🔗 |
Fri 8:15 a.m. - 8:30 a.m.
|
Torsional Diffusion for Molecular Conformer Generation
(
Oral
)
link »
SlidesLive Video » Diffusion-based generative models generate samples by mapping noise to data via the reversal of a diffusion process which typically consists of the addition of independent Gaussian noise to every data coordinate. This diffusion process is, however, not well suited to the fundamental task of molecular conformer generation where the degrees of freedom differentiating conformers lie mostly in torsion angles. We, therefore, propose Torsional Diffusion that generates conformers by leveraging the definition of a diffusion process over the space SO(2)^n, a high dimensional torus representing torsion angles, and a novel SE(3) equivariant model capable of accurately predicting the score over this process. Empirically, we demonstrate that our model outperforms state-of-the-art methods on diversity metrics and performs competitively on precision ones. When compared to Gaussian diffusion models, Torsional Diffusion enables significantly more accurate generation while performing almost two orders of magnitude fewer inference time-steps. |
Bowen Jing · Gabriele Corso · Regina Barzilay · Tommi Jaakkola 🔗 |
Fri 9:30 a.m. - 10:20 a.m.
|
Invited talk by Geemi Wellawatte
(
Invited Talks
)
|
🔗 |
Fri 10:20 a.m. - 10:35 a.m.
|
Denoising Diffusion Restoration Models
(
Oral
)
link »
SlidesLive Video »
Many interesting tasks in image restoration can be cast as linear inverse problems. A recent family of approaches for solving these problems uses stochastic algorithms that sample from the posterior distribution of natural images given the measurements. However, efficient solutions often require problem-specific supervised training to model the posterior, whereas unsupervised methods that are not problem-specific typically rely on inefficient iterative methods. This work addresses these issues by introducing Denoising Diffusion Restoration Models (DDRM), an efficient, unsupervised posterior sampling method. Motivated by variational inference, DDRM takes advantage of a pre-trained denoising diffusion generative model for solving any linear inverse problem. We demonstrate DDRM's versatility on several image datasets for super-resolution, deblurring, inpainting, and colorization under various amounts of measurement noise. DDRM outperforms the current leading unsupervised methods on the diverse ImageNet dataset in reconstruction quality, perceptual quality, and runtime, being $5\times$ faster than the nearest competitor. DDRM also generalizes well for natural images out of the distribution of the observed ImageNet training set.
|
Bahjat Kawar · Michael Elad · Stefano Ermon · Jiaming Song 🔗 |
Fri 10:35 a.m. - 10:50 a.m.
|
Semi-Discrete Normalizing Flows through Differentiable Voronoi Tessellation
(
Oral
)
link »
SlidesLive Video » Mapping between discrete and continuous distributions is a difficult task and many have had to resort to approximate or heuristical approaches. We propose a tessellation-based approach that directly learns quantization boundaries on a continuous space, complete with exact likelihood evaluations. This is done through constructing normalizing flows on convex polytopes defined via a differentiable tessellation. Using a simple homeomorphism with an efficient log determinant Jacobian, we can then cheaply parameterize distributions on bounded domains. We explore this approach in two application settings, mapping from discrete to continuous and vice versa. Firstly, a Voronoi dequantization allows automatically learning quantization boundaries in a multidimensional space. The location of boundaries and distances between regions can encode useful structural relations between the quantized discrete values. Secondly, a Voronoi mixture model has constant computation cost for likelihood evaluation regardless of the number of mixture components. Empirically, we show improvements over existing methods across a range of structured data modalities, and find that we can achieve a significant gain from just adding Voronoi mixtures to a baseline model. |
Tian Qi Chen · Brandon Amos · Maximilian Nickel 🔗 |
Fri 11:00 a.m. - 11:50 a.m.
|
Invited talk by Prof. Pratyush Tiwary
(
Invited Talks
)
|
🔗 |
Fri 11:50 a.m. - 12:40 p.m.
|
Invited talk by Octavian Ganea
(
Invited Talks
)
|
🔗 |
Fri 12:40 p.m. - 12:50 p.m.
|
Closing Remarks
|
🔗 |
-
|
Poster Session ( Poster ) link » | 🔗 |
-
|
Meta-FAVAE: Toward Fast and Diverse Few-shot Image Generation via Meta-Learning and Feedback Augmented Adversarial VAE
(
Poster
)
link »
SlidesLive Video » Learning to synthesis realistic images of new categories based on just one or a few examples is a challenge task for deep generative models, which usually require to train with a large amount of data. In this work, we propose a data efficient meta-learning framework for fast adapting to few-shot image generation task with an adversarial variational auto-encoder and feedback augmentation strategy. By training the model as a meta-learner, our method can adapt faster to the new task with significant reduction of model parameters. We designed a novel feedback augmented adversarial variational auto-encoder. This model learns to synthesize new samples for an unseen category just by seeing few examples from it and the generated interpolated samples are then used in feedback loop to expand the inputs for encoder to train the model, which can effectively increase the diversity of decoder output and prevent the model overfitting with insufficient samples of unseen category. Additionally, with the dual concatenation of latent code and random noise vectors, this method can be generalized to more complex color images compared to existing meta-learning based methods. Experimental results show that our model can have much faster adaption to new generation tasks of unseen categories while generating high-quality and diverse images on three datasets |
Fangli Ying · Aniwat Phaphuangwittayakul · Yi Guo · Xiaoyue Huang · 王 乐 🔗 |
-
|
UNCONDITIONAL IMAGE-TEXT PAIR GENERATION WITH MULTIMODAL CROSS QUANTIZER
(
Poster
)
link »
SlidesLive Video » Though deep generative models have gained a lot of attention, most of the existing works are designed for the unimodal generation task. In this paper, we explore a new method for unconditional image-text pair generation. We propose MXQ-VAE, a vector quantization method for multimodal image-text representation. MXQ-VAE accepts a paired image and text as input, and learns a joint quantized representation space, so that the image-text pair can be converted to a sequence of unified indices. Then we can use autoregressive generative models to model the joint image-text representation, and even perform unconditional image-text pair generation. Extensive experimental results demonstrate that our approach effectively generates semantically consistent image-text pair and also enhances meaningful alignment between image and text. |
Hyungyung Lee · Sungjin Park · Edward Choi 🔗 |
-
|
MMVAE+: Enhancing the Generative Quality of Multimodal VAEs without Compromises
(
Poster
)
link »
SlidesLive Video » Multimodal VAEs have recently gained attention as efficient models for weakly-supervised generative learning with a large number of modalities. However, all existing variants of multimodal VAEs are affected by a non-trivial trade-off between generative quality and generative coherence. We focus on the mixture-of-experts multimodal VAE (MMVAE), which achieves good coherence only at the expense of sample diversity and a resulting lack of generative quality. We present a novel variant of the MMVAE that improves its generative quality, while maintaining high semantic coherence. For this, shared and modality-specific information is modelled in separate latent subspaces. In contrast to previous approaches with separate subspaces, our model is robust to changes in latent dimensionality and regularization hyperparameters. We show that our model achieves both good generative coherence and high generative quality in challenging experiments, including more complex multimodal datasets than those used in previous works. |
Emanuele Palumbo · Imant Daunhawer · Julia E Vogt 🔗 |
-
|
Can GANs Recover Faults in Electrical Motor Sensors?
(
Poster
)
link »
SlidesLive Video » Electrical motors in industrial and emerging applications such as electrical automotive require high dynamic performance, robustness against parameter variation, and reliability. Recent advances in neural network-based estimators and fault detection techniques rely heavily on accurate sensor information. Due to the extreme operating conditions of electrical motors, there is always a chance of sensor failure which might lead to poor performance in downstream tasks using neural networks. This paper introduces the problem of identifying and recovering sensor faults using generative adversarial networks. We consider sensors monitoring various quantities like currents, voltages, speed, torque, temperature, and vibrations. We introduce fault model for these sensors to simulate training datasets. We use existing GAN based data imputation methods as baseline solutions. |
Sagar Verma · Nicolas Henwood · Marc Castella · Jean-Christophe Pesquet · Al Jebai 🔗 |
-
|
Meta-GAN for Few-Shot Image Generation
(
Poster
)
link »
While Generative Adversarial Networks (GANs) have rapidly advanced the state of the art in deep generative modeling, they require a large amount of diverse datapoints to adequately train, limiting their potential in domains where data is constrained. In this study, we explore the potential of few-shot image generation, enabling GANs to rapidly adapt to a small support set of datapoints from an unseen target domain and generate novel, high-quality examples from that domain. To do so, we adapt two common meta-learning algorithms from few-shot classification--Model-Agnostic Meta-Learning (MAML) and Reptile--to GANs, meta-training the generator and discriminator to learn an optimal weight initialization such that fine-tuning on a new task is rapid. Empirically, we demonstrate how our MAML and Reptile meta-learning algorithms, meta-trained on tasks from the MNIST and SVHN datasets, rapidly adapt at test time to unseen tasks and generate high-quality, photorealistic samples from these domains given only tens of support examples. In fact, we show that the generated image quality of these few-shot adapted models is on par with that of a baseline model vanilla-trained on thousands of samples from the same domain. Intriguingly, meta-training also takes substantially less time to converge compared to baseline training, indicating the power and efficiency of our approach. We also demonstrate the generalizability of our algorithms, working with both CNN- and Transformer-parametrized GANs. Overall, we present our MAML and Reptile meta-learning algorithms as effective strategies to enable few-shot image generation, improving the feasibility of deep generative models in practice. |
Arvind Sridhar 🔗 |
-
|
An Exploration of Learnt Representations of W Jets
(
Poster
)
link »
I present a Variational Autoencoder (VAE) trained on collider physics data (specifically boosted $W$ jets), with reconstruction error given by an approximation to the Earth Movers Distance (EMD) between input and output jets. This VAE learns a concrete representation of the data manifold, with semantically meaningful and interpretable latent space directions which are hierarchically organized in terms of their relation to physical EMD scales in the underlying physical generative process. The variation of the latent space structure with a resolution hyperparameter provides insight into scale dependent structure of the dataset and its information complexity. I introduce two measures of the dimensionality of the learnt representation that are calculated from this scaling.
|
Jack Collins 🔗 |
-
|
Video Diffusion Models
(
Poster
)
link »
We present results on video generation using diffusion models. We propose an architecture for video diffusion models which is a natural extension of the standard image architecture, and we show that it is effective to jointly train on image and video modeling. We show how to generate long videos using a new conditioning technique which performs better than previously proposed methods, and we present results on text-conditioned video generation and state-of-the-art results on UCF101 unconditional video generation. |
Jonathan Ho · Tim Salimans · Alexey Gritsenko · William Chan · Mohammad Norouzi · David Fleet 🔗 |
-
|
Score-Based Generative Models for Wireless Channel Modeling and Estimation
(
Poster
)
link »
SlidesLive Video »
In this work, we investigate score-based models for learning the distribution of multiple-input multiple-output (MIMO) wireless channels in structured stochastic environments, using either clean or corrupted (noisy) data for training. We find that score-based models are capable of generating high-quality synthetic channels, and have robust downstream estimation performance, sometimes surpassing strong baselines by up to $10$ dB in estimation error, when the inverse problem is ill-posed. Our preliminary results on training with corrupted data show improved performance against simple baselines, and introduce a very promising future research direction. Code will be made publicly available upon paper acceptance.
|
Marius Arvinte · Jonathan Tamir 🔗 |
-
|
Oracle Guided Image Synthesis with Relative Queries
(
Poster
)
link »
Isolating and controlling specific features in the outputs of generative models in a user-friendly way is a difficult and open-ended problem. We develop techniques that allow a user to generate an image they are envisioning in their head by answering a sequence of relative queries of the form \textit{``do you prefer image $a$ or image $b$?''} Our framework consists of a Conditional VAE that uses the collected relative queries to partition the latent space into preference-relevant features and non-preference-relevant features. We then use the user's responses to relative queries to determine the preference-relevant features that correspond to their envisioned output image. Additionally, we develop techniques for modeling the uncertainty in images' predicted preference-relevant features, allowing our framework to generalize to scenarios in which the relative query training set contains noise.
|
Alec Helbling · Christopher Rozell · Matthew O'Shaughnessy · Kion Fallah 🔗 |
-
|
Scalable Computation of Monge Maps with General Costs
(
Poster
)
link »
SlidesLive Video » Monge map refers to the optimal transport map between two probability distributions and provides a principled approach to transform one distribution to another. In spite of the rapid developments of the numerical methods for optimal transport problems, computing the Monge maps remains challenging, especially for high dimensional problems. In this paper, we present a scalable algorithm for computing the Monge map between two probability distributions. Our algorithm is based on a weak form of the optimal transport problem, thus it only requires samples from the marginals instead of their analytic expressions, and can accommodate optimal transport between two distributions with different dimensions. Our algorithm is suitable for general cost functions, compared with other existing methods for estimating Monge maps using samples, which are usually for quadratic costs. The performance of our algorithms is demonstrated through a series of experiments with both synthetic and realistic data. |
Jiaojiao Fan · Shu Liu · Shaojun Ma · Yongxin Chen · Hao-Min Zhou 🔗 |
-
|
Causal-TGAN: Modeling Tabular Data Using Causally-Aware GAN
(
Poster
)
link »
SlidesLive Video » Generative adversarial net (GAN)-based tabular data generation has recently received significant attention for its power for data augmentation when available data is limited. Most prior works have applied generic GAN frameworks for tabular data generation without explicitly considering inter-variable relationships, which is important for modeling tabular data distribution. In this work, we design Causal-TGAN, a causally-aware generator architecture that can capture the relationships among variables (continuous-type, discrete-type, and mixed-type) by explicitly modeling the pre-defined inter-variable causal relationships. The flexibility of Causal-TGAN is its capability to support different degrees of subject matter expert domain knowledge (e.g., complete or partial) about the inter-variable causal relations. Extensive experimental results on both simulated and real-world datasets demonstrate that exploiting causal relations in deep generative models could improve the generated tabular data quality compared to the state-of-the-art. |
Bingyang Wen · Yupeng Cao · Fan Yang · Koduvayur Subbalakshmi · Rajarathnam Chandramouli 🔗 |
-
|
Implicit Neural Video Compression
(
Poster
)
link »
We propose a method to compress full-resolution video sequences with implicit neural representations. Each frame is represented as a neural network that maps coordinate positions to pixel values. We use a separate implicit network to modulate the coordinate inputs, which enables efficient motion compensation between frames. Together with a small residual network, this allows us to efficiently compress P-frames relative to the previous frame. We further lower the bitrate by storing the network weights with learned integer quantization. Our method offers several simplifications over established neural video codecs: it does not require the receiver to have access to a pretrained neural network, does not use expensive interpolation-based warping operations, and does not require a separate training dataset. |
Yunfan Zhang · Ties van Rozendaal · Johann Brehmer · Markus Nagel · Taco Cohen 🔗 |
-
|
Semantic Image Synthesis with Semantically Coupled VQ-Model
(
Poster
)
link »
SlidesLive Video » Semantic image synthesis enables control over unconditional image generation by allowing guidance on what is being generated. We conditionally synthesize the latent space from a vector quantized model (VQ-model) pre-trained to autoencode images. Instead of training an autoregressive Transformer on separately learned conditioning latents and image latents, we find that jointly learning the conditioning and image latents significantly improves the modeling capabilities of the Transformer model. While our jointly trained VQ-model achieves a similar reconstruction performance to a vanilla VQ-model for both semantic and image latents, tying the two modalities at the autoencoding stage proves to be an important ingredient to improve autoregressive modeling performance. We show that our model improves semantic image synthesis using autoregressive models on popular semantic image datasets ADE20k, Cityscapes and COCO-Stuff. |
Stephan Alaniz · Thomas Hummel · Zeynep Akata 🔗 |
-
|
Annealed Importance Sampling meets Score Matching
(
Poster
)
link »
Annealed Importance Sampling (AIS) is one of the most effective methods for marginal likelihood estimation. It relies on a sequence of distributions interpolating between a tractable initial distribution and the posterior of interest which we simulate from approximately using a non-homogeneous Markov chain. To obtain an importance sampling (IS) estimate of the marginal likelihood, AIS introduces an extended target distribution to reweight the Markov chain proposal. While much effort has been devoted to improving the proposal distribution used by AIS by changing the intermediate distributions and corresponding Markov kernels, an underappreciated issue is that AIS uses an convenient but suboptimal extended target distribution which can hinder its performance. We leverage here recent progress in score-based generative modeling to learn the optimal extended target distribution for a given AIS proposal using score matching ideas. We demonstrate this novel differentiable AIS procedure on a number of synthetic benchmark distributions and a normalizing flow target. |
Arnaud Doucet · Will Grathwohl · Alexander de G. Matthews · Heiko Strathmann 🔗 |
-
|
From Points to Functions: Infinite-dimensional Representations in Diffusion Models
(
Poster
)
link »
SlidesLive Video » Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution as opposed to Generative Adversarial Networks (GANs) or the decoder of Variational Autoencoders (VAEs) which produce samples from the target distribution in a single step. Thus, in diffusion models every sample is naturally connected to a random trajectory which is a solution to a learned stochastic differential equation (SDE). Generative models are only concerned with the final state of this trajectory that delivers samples from the desired distribution. \cite{abstreiter2021diffusion} showed that these stochastic trajectories can be seen as continuous filters that wash out information along the way. Consequently, there is an intermediate time step at which the preserved information is optimal for a given downstream task. In this work, we show that a combination of information content from different time steps gives a strictly better representation for the downstream task. We introduce an attention and recurrence based modules that ``learn to mix'' information content of various time-steps such that the resultant representation leads to superior performance in downstream tasks. |
Sarthak Mittal · Guillaume Lajoie · Stefan Bauer · Arash Mehrjou 🔗 |
-
|
Simulate Time-integrated Coarse-grained Molecular Dynamics with Geometric Machine Learning
(
Poster
)
link »
Molecular dynamics (MD) simulation is the workhorse of various scientific domains but is limited by high computational cost. Learning-based force fields have made major progress in accelerating ab-initio MD simulation but are still not fast enough for many real-world applications that require long-time MD simulation. In this paper, we adopt a different machine learning approach where we coarse-grain a physical system using graph clustering and model the system evolution with a very large time-integration step using graph neural networks. A novel score-based GNN refinement module resolves the long-standing challenge of long-time simulation instability. Despite only being trained with short MD trajectory data, our learned simulator can generalize to unseen novel systems, and simulate for much longer than the training trajectories. Properties requiring 10-100 ns level long-time dynamics can be accurately recovered at several orders of magnitude higher speed than classical force fields. We demonstrate the effectiveness of our method on two realistic complex systems: (1) single-chain coarse-grained polymers in implicit solvent; (2) multi-component Li-ion polymer electrolyte systems. |
Xiang Fu · Tian Xie · Nathan Rebello · Bradley Olsen · Tommi Jaakkola 🔗 |
-
|
Zero-Shot Recommender Systems
(
Poster
)
link »
Performance of recommender systems (RecSys) relies heavily on the amount of training data available. This poses a chicken-and-egg problem for early-stage products, whose amount of data, in turn, relies on the performance of their RecSys. In this paper, we explore the possibility of zero-shot learning in RecSys, to enable generalization from an old dataset to an entirely new dataset. We develop, to the best of our knowledge, the first deep generative model, dubbed ZEro-Shot Recommenders (ZESRec), that is trained on an old dataset and generalize to a new one where there areneither overlapping users nor overlapping items, a setting that contrasts typical cross-domain RecSys that has either overlapping users or items. We study three pairs of real-world datasets and demonstrate that ZESRec can successfully enable such zero-shot recommendations, opening up new opportunities for resolving the chicken-and-egg problem for data-scarce startups or early-stage products. |
HAO DING · Anoop Deoras · Bernie Wang · Hao Wang 🔗 |
-
|
SIReN-VAE: Leveraging Flows and Amortized Inference for Bayesian Networks
(
Poster
)
link »
Initial work on variational autoencoders assumed independent latent variables with simple distributions. Subsequent work has explored incorporating more complex distributions and dependency structures: including normalizing flows in the encoder network allows latent variables to entangle non-linearly, creating a richer class of distributions for the approximate posterior, and stacking layers of latentvariables allows more complex priors to be specified for the generative model. This work explores incorporating arbitrary dependency structures, as specified by Bayesian networks, into VAEs. This is achieved by extending both the prior and inference network with graphical residual flows—residual flows that encode conditional independence by masking the weight matrices of the flow’s residualblocks. We compare our model’s performance on several synthetic datasets and show its potential in data-sparse settings. |
Jacobie Mouton · Rodney Kroon 🔗 |
-
|
Improved Image Generation via Sparsity
(
Poster
)
link »
The interest of the deep learning community in image synthesis has grown massively in recent years. Nowadays, deep generative methods, and specifically Generative Adversarial Networks (GANs), are leading to state-of-the-art performance, capable of synthesizing images that appear realistic.While the efforts for improving the quality of the generated images are extensive, most attempts still consider the generator part as an uncorroborated "black-box".In this paper, we aim to provide a better understanding of the image generation process. We interpret existing generators as implicitly relying on sparsity-inspired models.More specifically, we show that generators can be viewed as manifestations of the Convolutional Sparse Coding (CSC) and its Multi-Layered version (ML-CSC) synthesis processes.We leverage this observation by explicitly enforcing a sparsifying regularization on appropriately chosen activation layers in the generator and demonstrate that this leads to improved image synthesis. Furthermore, we show that the same rationale and benefits apply to generators serving inverse problems, demonstrated on the Deep Image Prior (DIP) method. |
roy ganz · Michael Elad 🔗 |
-
|
Graphical Residual Flows
(
Poster
)
link »
Graphical flows add further structure to normalizing flows by encoding non-trivial variable dependencies. Previous graphical flow models have focused primarily on a single flow direction: the normalizing direction for density estimation, or the generative direction for inference. However, to use a single flow to perform tasks in both directions, the model must exhibit stable and efficient flow inversion. This work introduces graphical residual flows, a graphical flow based on invertible residual networks. Our approach to incorporating dependency information in the flow, means that we are able to calculate the Jacobian determinant of these flows exactly. Our experiments confirm that graphical residual flows provide stable and accurate inversion that is also more time-efficient than alternative flows with similar task performance. Furthermore, our model provides performance competitive with other graphical flows for both density estimation and inference tasks. |
Jacobie Mouton · Rodney Kroon 🔗 |
-
|
Inkorrect: Digital Ink Spelling Correction
(
Poster
)
link »
SlidesLive Video » We introduce Inkorrect, a digital ink (online handwriting) spelling correction approach. We show that existing metrics don’t capture the quality of spelling correction, and propose a new one. Our approach outperforms previous work in automated and human evaluation, while also being more data- and label-efficient. |
Andrii Maksai · Henry Rowley · Jesse Berent · Claudiu Musat 🔗 |
-
|
Conditional Generative Quantile Networks via Optimal Transport
(
Poster
)
link »
Quantile regression has a natural extension to generative modelling by leveraging a stronger pointwise convergence than in distribution. While the pinball quantile loss works well in the scalar case, it cannot be readily extended to the vector case. In this work, we propose a multivariate quantile approach for generative modelling using optimal transport with provable guarantees. Specifically, we suggest that by optimizing smooth functions parameterized by neural networks with respect to the dual of the correlation maximization problem, the function uniformly converges to the optimal convex potential. Thus, we construct a Brenier map as our generative quantile network. Furthermore, we introduce conditioning by approximating the convex potential using a first-order approximation with respect to the covariates. Through extensive experiments on synthetic and real datasets for conditional generative and probabilistic time-series forecasting tasks, we demonstrate the efficacy and versatility of our theoretically motivated model as a distribution estimator and probabilistic forecaster. |
Jesse Sun · Dihong Jiang · Yaoliang Yu 🔗 |
-
|
Benchmarking Generative Latent Variable Models for Speech
(
Poster
)
link »
Stochastic latent variable models (LVMs) achieve state-of-the-art performance on natural image generation but are still inferior to deterministic models on speech. In this paper, we develop a speech benchmark of popular temporal LVMs and compare them against state-of-the-art deterministic models. We report the likelihood, which is a much used metric in the image domain, but rarely, or incomparably, reported for speech models. To assess the quality of the learned representations, we also compare their usefulness for phoneme recognition. Finally, we adapt the Clockwork VAE, a state-of-the-art temporal LVM for video generation, to the speech domain. Despite being autoregressive only in latent space, we find that the Clockwork VAE can outperform previous LVMs and reduce the gap to deterministic models by using a hierarchy of latent variables. |
Jakob Havtorn · Lasse Borgholt · Søren Hauberg · Jes Frellsen · Lars Maaløe 🔗 |
-
|
Object Representations as Fixed Points: Training Iterative Inference Algorithms with Implicit Differentiation
(
Poster
)
link »
Deep generative models, particularly those that aim to factorize the observations into discrete entities (such as objects), must often use iterative inference procedures that break symmetries among equally plausible explanations for the data. Such inference procedures include variants of the expectation-maximization algorithm and structurally resemble clustering algorithms in a latent space. However, combining such methods with deep neural networks necessitates differentiating through the inference process, which can make optimization exceptionally challenging. In this work, we observe that such iterative inference methods can be made differentiable by means of the implicit function theorem, and develop an implicit differentiation approach that improves the stability and tractability of training such models by decoupling the forward and backward passes. This connection enables us to apply recent advances in optimizing implicit layers to not only improve the stability and optimization of the slot attention module in SLATE, a state-of-the-art method for learning entity representations, but do so with constant space and time complexity in backpropagation and only one additional line of code. |
Michael Chang · Thomas L. Griffiths · Sergey Levine 🔗 |