Poster
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy （2nd Workshop）

A Geometric Perspective on Recursive Synthetic Training

Patrick Batsell ⋅ Thomas Walker ⋅ Richard Baraniuk

Project Page [ OpenReview]

Abstract

Scaling high-quality datasets to improve generative model quality is effective, but is becoming increasingly challenging due to data scarcity and contamination. Trying to alleviate this by naively bootstrapping generative models by training on synthetic data results in significant quality degradation and a collapse in sample diversity. In this paper, we study the negative effects of synthetic data on the geometry of deep generative networks (DGNs) to understand how to go beyond naive synthetic data utilization. Through empirical simulations, we show that retraining on synthetic data leads to DGNs with low-quality singular vectors and input-output Jacobians with low effective rank. Using these insights, we develop a strategy to generate synthetic data from a DGN to improve its quality through negative guidance.

Chat is not available.