Manifold Generalization Provably Proceeds Memorization in Diffusion Models
Abstract
Diffusion models often produce novel samples when the learned score is coarse, a behavior not explained by the standard interpretation of diffusion training as density estimation. In this paper, we show that, under the manifold hypothesis, this phenomenon can be explained by coarse scores encoding the geometric structure of the data while discarding fine-scale distributional information of the population measure μdata. Concretely, while estimating a density supported on a k-dimensional manifold from N samples is known to suffer from the slow rate O(N^{-1/k}), we prove that geometric learning via coarse scores attains a fast, near-parametric rate O(N^{-1}) for learning a different distribution whose density is comparable to that of μdata within any O(N^{-(β-1)/(4k)})-neighborhood on the manifold, where β denotes the manifold smoothness. This result readily implies that, when the manifold is sufficiently smooth (in particular, β > 5), the phenomenon of generalization—formalized as the ability to generate novel high-fidelity samples—occurs at a statistical rate strictly faster than that required for density estimation of μ_data.