SDErasure: Concept-Specific Trajectory Shifting for Concept Erasure via Adaptive Diffusion Classifier
Fengyuan Miao · Shancheng Fang · Lingyun Yu · Yadong Qu · Yuhao Sun · Xiaorui Wang · Hongtao Xie
Abstract
Concept erasure methods have proven effective in mitigating the potential for text‑to‑image diffusion models to produce harmful content. Nevertheless, prevailing methods based on post fine-tuning introduce substantial disruption to the original model’s parameter distribution and suffer from excessive model intrusiveness in two dimensions. (1) Images generated under erased concepts are perceptually aberrant. (2) Images generated under unrelated concepts exhibit pronounced quality degradation. We attribute these limitations to applying a uniform strategy to erase diverse concepts, failing to account for concept-specific generative mechanisms. Through rigorous experimentation and analysis, we identify that the generative process of each concept hinges on a narrow subset of critical timesteps. This insight motivates a targeted intervention strategy that enables precise and minimally invasive concept erasure. Therefore, we introduce $\textbf{SDErasure}$, a novel training framework for concept-specific erasure via adaptive trajectory shifting. First, a Step Selection algorithm that utilizes a diffusion classifier is proposed to guide the model in pinpointing the key timesteps associated with the undesired concept’s generation. Second, a Score Rematching loss is introduced to align the model’s predicted score function with that of anchor concepts, extending its applicability to both anchor-free erasing and anchor-based altering. Third, a Quality Regulation consisting of early-preserve loss and concept-retain loss is introduced to maintain the model's generative quality along two dimensions. Empirical results demonstrate that SDErasure achieves state-of-the-art concept erasure performance, reducing FID from 9.51 to 6.74 while effectively eliminating the target concept.
Successful Page Load