Poster
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy
How Compositional Generalization and Creativity Improve as Diffusion Models are Trained
Alessandro Favero · Antonio Sclocchi · Francesco Cagnetta · Pascal Frossard · Matthieu Wyart
Keywords: [ compositionality ] [ generalization ] [ diffusion models ] [ probabilistic graphical models ] [ sample complexity ] [ Science of deep learning ]
Natural data is often organized as a hierarchical combination of features. How many samples are needed for a generative model to learn the rules governing how these features are composed, so as to produce a combinatorial number of novel data? What signal in the data is exploited to learn? We investigate these questions both theoretically and empirically. Theoretically, we consider diffusion models trained on simple probabilistic context-free grammars - tree-like graphical models used to represent the structure of data such as language and images. We demonstrate that diffusion models learn compositional rules with the sample complexity required for clustering together features with statistically similar contexts. This clustering emerges hierarchically: higher-level, more abstract structures require more data to be identified. This mechanism leads to a sample complexity that scales polynomially, and not exponentially, with the dimension of the structure considered. We thus predict that diffusion models trained on intermediate dataset size generate data that exhibit local coherence up to a certain scale, but lack global coherence. Finally, we measure coherence systematically in data generated by diffusion models trained in different domains. We find remarkable agreement with our predictions: generated text and images achieve progressively larger coherence lengths as the training time or dataset size grows.