Bigger Is Not Better Under Differential Privacy: Optimization Failure at Eleven-Billion Scale in Vision–Language Model Fine-Tuning
TZU-EN SU ⋅ Li-Hong Guo ⋅ Yangmi Su ⋅ Cheng-Yen Li
Abstract
Differential privacy (DP) is an appealing safeguard for adapting instruction-tuned vision–language models (VLMs), but its scaling behavior under standard private fine-tuning remains unclear. We present a focused negative result for DP-SGD fine-tuning with LoRA on two widely used backbones: PaLI-GEMMA-3B-PT-224 and LLaMA-3.2-11B-Vision-Instruct. We target $(\varepsilon \in \{1,10,100,1000\}, \delta=10^{-5})$ by tuning only the DP-SGD noise scale while keeping the data, epochs, batch size, clipping norm, and all non-private hyperparameters fixed; $\varepsilon$ is computed with standard DP accounting. Across all budgets, the 3B model trains stably and stays close to non-private validation performance, whereas the 11B model becomes optimization-limited, with flat, high loss and collapsed lexical-overlap metrics. BERTScore drops much less, revealing a lexical–semantic mismatch that can obscure residual utility under DP. Our findings suggest that scaling DP-SGD+LoRA to 11B may require additional stability interventions beyond simply relaxing $\varepsilon$.
Chat is not available.
Successful Page Load