## Neural Compression: From Information Theory to Applications

### Stephan Mandt, Robert Bamler, Yingzhen Li, Christopher Schroers, Yang Yang, Max Welling, Taco Cohen

Abstract:

Data compression is a problem of great practical importance, and a new frontier for machine learning research that combines empirical findings (from the deep probabilistic modeling literature) with fundamental theoretical insights (from information theory, source coding, and minimum description length theory). Recent work building on deep generative models such as variational autoencoders, GANs, and normalizing flows showed that novel machine-learning-based compression methods can significantly outperform state-of-the-art classical compression codecs for image and video data. At the same time, these neural compression methods provide new evaluation metrics for model and inference performance on a rate/distortion trade-off. This workshop aims to draw more attention to the young and highly impactful field of neural compression. In contrast to other workshops that focus on practical compression performance, our goal is to bring together researchers from deep learning, information theory, and probabilistic modeling, to learn from each other and to encourage exchange on fundamentally novel issues such as the role of stochasticity in compression algorithms or ethical risks of semantic compression artifacts.

Timezone: »

### Schedule

 Fri 3:30 a.m. - 3:35 a.m. Opening Remarks (Moderation) Robert Bamler Fri 3:35 a.m. - 4:00 a.m. Fabian Mentzer (Invited Talk) »    TBD Fabian Mentzer, Taco Cohen Fri 4:00 a.m. - 4:10 a.m. Q&A Fabian Mentzer (Q&A) Taco Cohen Fri 4:10 a.m. - 4:35 a.m. Karen Ullrich (Invited Talk (live)) Taco Cohen Fri 4:35 a.m. - 4:45 a.m. Q&A Karen Ullrich (Q&A) Taco Cohen Fri 4:45 a.m. - 5:05 a.m. Oral 1: Yann Dubois et al., Lossy Compression for Lossless Prediction (Contributed Talk) »  link »    Most data is "seen" only by algorithms. Yet, data compressors are designed for perceptual fidelity rather than for storing information needed by algorithms performing downstream tasks. So, we are likely storing vast amounts of unneeded information. In this paper, we characterize the minimum bit-rates required to ensure high performance on all predictive tasks that are invariant under a set of transformations. Based on our theory, we design unsupervised objectives for training neural compressors that are closely related to self-supervised learning and generative modeling. Using these objectives, we achieve rate savings of around 60\% on standard datasets, like MNIST, without decreasing classification performance. Link » Taco Cohen Fri 5:05 a.m. - 5:10 a.m. Spotlight 1: Lucas Theis & Aaron Wagner, A coding theorem for the rate-distortion-perception function (Contributed Talk) »  link »    The rate-distortion-perception function (RDPF; Blau and Michaeli, 2019) has emerged as a useful tool for thinking about realism and distortion of reconstructions in lossy compression. Unlike the rate-distortion function, however, it is unknown whether encoders and decoders exist that achieve the rate suggested by the RDPF. Building on results by Li and El Gamal (2018), we show that the RDPF can indeed be achieved using stochastic, variable-length codecs. For this class of codecs, we also prove that the RDPF lower-bounds the achievable rate. Link » Fri 5:10 a.m. - 5:15 a.m. Spotlight 2: Yibo Yang and Stephan Mandt, Lower Bounding Rate-Distortion From Samples (Contributed Talk) »  link »    The rate-distortion function $R(D)$ tells us the minimal number of bits on average to compress a random object within a given distortion tolerance. A lower bound on $R(D)$ therefore represents a fundamental limit on the best possible rate-distortion performance of any lossy compression algorithm, and can help us assess the potential room for improvement. We make a first attempt at an algorithm for computing such a lower bound, applicable to any data source that we have samples of. Based on a dual characterization of $R(D)$ in terms of a constrained maximization problem, our method approximates the exact constraint function by an asymptotically unbiased sample-based estimator, allowing for stochastic optimization. On a 2D Gaussian source, we obtain a lower bound within 1 bit of the true $R(D)$ on average. Link » Fri 5:15 a.m. - 5:20 a.m. Spotlight 3: James Townsend and Iain Murray, Lossless compression with state space models using bits back coding (Contributed Talk) »  link »    We generalize the 'bits back with ANS' method to time-series models with a latent Markov structure. This family of models includes hidden Markov models (HMMs), linear Gaussian state space models (LGSSMs) and many more. We provide experimental evidence that our method is effective for small scale models, and discuss its applicability to larger scale settings such as video compression. Link » Fri 5:20 a.m. - 5:25 a.m. Spotlight 4: Théo Ladune et al., Conditional Coding for Flexible Learned Video Compression (Contributed Talk) »  link »    This paper introduces a novel framework for end-to-end learned video coding. Image compression is generalized through conditional coding to exploit information from reference frames, allowing to process intra and inter frames with the same coder. The system is trained through the minimization of a rate-distortion cost, with no pre-training or proxy loss. Its flexibility is assessed under three coding configurations (All Intra, Low-delay P and Random Access), where it is shown to achieve performance competitive with the state-of-the-art video codec HEVC. Link » Fri 5:25 a.m. - 5:30 a.m. Spotlight 5: Ruihan Yang et al., Scale Space Flow With Autoregressive Priors (Contributed Talk) »  link »    There has been a recent surge of interest in neural video compression models that combines data-driven dimensionality reduction with learned entropy coding. ScaleSpace Flow (SSF) is among the most popular variants due to its favorable rate-distortion performance. Recent work showed that this approach could be further improved by structured priors and stochastic temporal autoregressive transforms on the frame level. However, as of early 2021, most state-of-the-art compression approaches work with time-independent priors. Assuming that frame patents are still temporally correlated, further compression gains should be expected by conditioning the priors on temporal information. We show that the naive way of conditioning priors on previous stochastic latent states degrades performance, but temporal conditioning on a deterministic quantity does lead to a consistent improvement over all baselines. Evaluating the benefits of the temporal prior given the involved challenges in training and deployment remains an open question. Link » Fri 5:30 a.m. - 6:00 a.m. Q&A + Discussion Oral 1 & Spotlights 1-5 (Q&A + Discussion) Christopher Schroers Fri 6:00 a.m. - 6:30 a.m. Break Fri 6:30 a.m. - 6:32 a.m. Introduction Rianne van den Berg (Moderation) Robert Bamler Fri 6:32 a.m. - 6:55 a.m. Rianne van den Berg (Invited Talk) »    TBD Rianne van den Berg, Robert Bamler Fri 6:55 a.m. - 7:05 a.m. Q&A Rianne van den Berg (Q&A) Robert Bamler Fri 7:05 a.m. - 7:30 a.m. Oren Rippel (Invited Talk) »    TBD Oren Rippel, Robert Bamler Fri 7:30 a.m. - 7:40 a.m. Q&A Oren Rippel (Q&A) Robert Bamler Fri 7:40 a.m. - 8:00 a.m. Break Fri 8:00 a.m. - 9:00 a.m. Poster Session Yang Yang Fri 8:00 a.m. - 8:02 a.m. Opening Remarks for Poster Session (Moderation) Yang Yang Fri 9:00 a.m. - 10:00 a.m. Panel Discussion » Panel discussion with Alex Alemi (Google), Leonardo Chiariglione (MPEG & MPAI), Irina Higgins (DeepMind), Philipp Krähenbühl (UT Austin), and Scott Labrozzi (Disney Streaming Services) Stephan Mandt Fri 10:00 a.m. - 10:02 a.m. Introduction Jonathan Ho (Moderation) Stephan Mandt Fri 10:02 a.m. - 10:25 a.m. Jonathan Ho (Invited Talk) »    TBD Stephan Mandt Fri 10:25 a.m. - 10:35 a.m. Q&A Jonathan Ho (Invited Talk) Stephan Mandt Fri 10:35 a.m. - 11:00 a.m. Johannes Ballé (Invited Talk) »    TBD Johannes Ballé, Taco Cohen Fri 11:00 a.m. - 11:10 a.m. Q&A Johannes Ballé (Q&A) Taco Cohen Fri 11:10 a.m. - 11:30 a.m. Oral 2: Yangjun Ruan et al., Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding (Contributed Talk) »  link »    Latent variable models have been successfully applied in lossless compression with the bits-back coding algorithm. However, bits-back suffers from an increase in the bitrate equal to the KL divergence between the approximate posterior and the true posterior. In this paper, we show how to remove this gap asymptotically by deriving bits-back schemes from tighter variational bounds. The key idea is to exploit extended space representations of Monte Carlo estimators of the marginal likelihood. Naively applied, our schemes would require more initial bits than the standard bits-back coder, but we show how to drastically reduce this additional cost with couplings in the latent space. We demonstrate improved lossless compression rates in a variety of settings. Link » Yang Yang Fri 11:30 a.m. - 11:35 a.m. Spotlight 6: Lucas Theis|Jonathan Ho, Importance weighted compression (Contributed Talk) »  link »    The connection between variational autoencoders (VAEs) and compression is well established and they have been used for both lossless and lossy compression. Compared to VAEs, importance-weighted autoencoders (IWAEs) achieve a larger bound on the log-likelihood. However, it is not well understood whether a similar connection between IWAEs and compression exists and whether the improved loss corresponds to better compression performance. Here we show that the loss of IWAEs can indeed be interpreted as the cost of lossless or lossy compression schemes, and using IWAEs for compression can lead to small improvements in performance. Link » Fri 11:35 a.m. - 11:40 a.m. Spotlight 7: Emilien Dupont, COIN: COmpression with Implicit Neural representations (Contributed Talk) »  link »    We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches. Link » Fri 11:40 a.m. - 11:45 a.m. Spotlight 8: Yunhao Ge, Graph Autoencoder for Graph Compression and Representation Learning (Contributed Talk) »  link »    We consider the problem of graph data compression and representation. Recent developments in graph neural networks (GNNs) focus on generalizing convolutional neural networks (CNNs) to graph data, which includes redesigning convolution and pooling operations for graphs. However, few methods focus on effective graph compression to obtain a smaller graph, which can reconstruct the original full graph with less storage and can provide useful latent representations to improve downstream task performance. To fill this gap, we propose Multi-kernel Inductive Attention Graph Autoencoder (MIAGAE), which, instead of compressing nodes/edges separately, utilizes the node similarity and graph structure to compress all nodes and edges as a whole. Similarity attention graph pooling selects the most representative nodes with the most information by using the similarity and topology among nodes. Our multi-kernel Inductive-Convolution layer can focus on different aspects and learn more general node representations in evolving graphs. We demonstrate that MIAGAE outperforms state-of-the-art methods for graph compression and few-shot graph classification, with superior graph representation learning. Link » Fri 11:45 a.m. - 11:50 a.m. Spotlight 9: George Zhang et al., Universal Rate-Distortion-Perception Representations for Lossy Compression (Contributed Talk) »  link »    In the context of lossy compression, \citet{blau2019rethinking} adopt a mathematical notion of perceptual quality defined in terms of a distributional constraint and characterize the three-way tradeoff between rate, distortion and perception, generalizing the classical rate-distortion tradeoff. Within this rate-distortion-perception framework, we consider the notion of (approximately) universal representations in which one may fix an encoder and vary the decoder to (approximately) achieve any point along the perception-distortion tradeoff. We show that the penalty for fixing the encoder is zero in the Gaussian case, and give bounds in the case of arbitrary distributions. In principle, a small penalty refutes the need to design an end-to-end system for each particular objective. We provide experimental results on MNIST and SVHN to show that there exist practical constructions that suffer only a small penalty, i.e. machine learning models learn representation maps which are approximately universal within their operational capacities. Link » Fri 11:50 a.m. - 11:55 a.m. Spotlight 10: Leonhard Helminger et al., Lossy Image Compression with Normalizing Flows (Contributed Talk) »  link »    Deep learning based image compression has recently witnessed exciting progress and in some cases even managed to surpass transform coding based approaches. However, state-of-the-art solutions for deep image compression typically employ autoencoders which map the input to a lower dimensional latent space and thus irreversibly discard information already before quantization. In contrast, traditional approaches in image employ an invertible transformation before performing the quantization step. Inspired by this, we propose a deep image compression method that is able to go from low bit-rates to near lossless quality by leveraging normalizing flows to learn a bijective mapping from the image space to a latent representation. We demonstrate further advantages unique to our solution, such as the ability to maintain constant quality results through re-encoding, even when performed multiple times. To the best of our knowledge, this is the first work leveraging normalizing flows for lossy image compression. Link » Fri 11:55 a.m. - 12:25 p.m. Q&A + Discussion Oral 2 & Spotlights 6-10 (Q&A + Discussion) Yang Yang Fri 12:25 p.m. - 12:30 p.m. Closing Remarks (Moderation) Yingzhen Li Fri 12:30 p.m. - 1:30 p.m. Poster Session