Skip to yearly menu bar Skip to main content


Poster

Influence-Guided Diffusion for Dataset Distillation

Mingyang Chen · Jiawei Du · Bo Huang · Yi Wang · Xiaobo Zhang · Wei Wang

Hall 3 + Hall 2B #62
[ ]
Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

Dataset distillation aims to streamline the training process by creating a compact yet effective dataset for a much larger original dataset. However, existing methods often struggle with distilling large, high-resolution datasets due to prohibitive resource costs and limited performance, primarily stemming from sample-wise optimizations in the pixel space. Motivated by the remarkable capabilities of diffusion generative models in learning target dataset distributions and controllably sampling high-quality data tailored to user needs, we propose framing dataset distillation as a controlled diffusion generation task aimed at generating data specifically tailored for effective training purposes. By establishing a correlation between the overarching objective of dataset distillation and the trajectory influence function, we introduce the Influence-Guided Diffusion (IGD) sampling framework to generate training-effective data without the need to retrain diffusion models. An efficient guided function is designed by leveraging the trajectory influence function as an indicator to steer diffusions to produce data with influence promotion and diversity enhancement. Extensive experiments show that the training performance of distilled datasets generated by diffusions can be significantly improved by integrating with our IGD method and achieving state-of-the-art performance in distilling ImageNet datasets. Particularly, an exceptional result is achieved on the ImageNet-1K, reaching 60.3\% at IPC=50. Our code is available at https://github.com/mchen725/DD_IGD.

Live content is unavailable. Log in and register to view live content