Rethinking Diffusion Models for Science: From Generation to Faithful Reconstruction
Abstract
Diffusion models have achieved remarkable success as generative tools, yet their impact on scientific applications, where fidelity, consistency, and reliability are paramount, remains limited. In this talk, I will argue that the central challenge is not generation, but faithful reconstruction. I will first present a perspective on diffusion for inverse problems, emphasizing how standard formulations favor perceptual quality at the expense of faithfulness to measurements, and how this tension can be addressed through data-consistent, incremental reconstruction mechanisms that better align with scientific objectives. I will then turn to modern text-to-image systems and show that, despite their impressive visual realism, diffusion models struggle with enforcing even simple global constraints such as object counts, reflecting deeper issues in how structure is formed during the sampling process; I will discuss how lightweight, test-time steering can partially mitigate these failures while highlighting their underlying causes. Taken together, these examples point to a broader misalignment between current generative paradigms and the needs of science. I will conclude by arguing that progress will require not only better modeling, but also fundamentally better data, and briefly discuss our efforts toward curating and releasing MosaicMRI, one of the largest and most diverse medical imaging datasets to date, as a step in this direction.