Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
Shikang Zheng · Guantao Chen · Qinming Zhou · Yuqi Lin · Lixuan He · Chang Zou · Peiliang Cai · Jiacheng Liu · Linfeng Zhang
Abstract
Diffusion Transformers (DiTs) offer state-of-the-art fidelity in image and video synthesis, but their iterative sampling process remains a major bottleneck due to the high cost of transformer forward passes at each timestep. To mitigate this, feature caching has emerged as a training-free acceleration technique that reuses or forecasts hidden representations. However, existing methods often apply a uniform caching strategy across all feature dimensions, ignoring their heterogeneous dynamic behaviors. Therefore, we adopt a new perspective by modeling hidden feature evolution as a mixture of ODEs across dimensions, and introduce \textbf{HyCa}, a Hybrid ODE solver inspired caching framework that applies dimension-wise caching strategies. HyCa achieves near-lossless acceleration across diverse domains and models, including 5.56$\times$ speedup on FLUX and HunyuanVideo, 6.24$\times$ speedup on Qwen-Image and Qwen-Image-Edit without retraining. \emph{Our code is in supplementary material and will be released on Github.}
Successful Page Load