Keywords: [ energy-based models ]
Mutual information (MI) is a fundamental quantity in information theory and machine learning. However, direct estimation of mutual information is intractable, even if the true joint probability density for the variables of interest is known, as it involves estimating a potentially high-dimensional log partition function. In this work, we view mutual information estimation from the perspective of importance sampling. Since naive importance sampling with the marginal distribution as a proposal requires exponential sample complexity in the true mutual information, we propose several improved proposals which assume additional density information is available. In settings where the full joint distribution is available, we propose Multi-Sample Annealed Importance Sampling (AIS) bounds on mutual information, which we demonstrate can tightly estimate large values of MI in our experiments. In settings where only a single marginal distribution is known, our MINE-AIS method improves upon existing variational methods by directly optimizing a tighter lower bound on MI, using energy-based training to estimate gradients and Multi-Sample AIS for evaluation. Our methods are particularly suitable for evaluating MI in deep generative models, since explicit forms for the marginal or joint densities are often available. We evaluate our bounds on estimating the MI of VAEs and GANs trained on the MNIST and CIFAR datasets, and showcase significant gains over existing bounds in these challenging settings with high ground truth MI.