When Tokens Decay and Turns Amplify: A Dual-Granularity Framework for Multi-Turn Preference Optimization
Yangyi Fang ⋅ Jiaye Lin ⋅ Xiaoliang Fu ⋅ Haolin Shi ⋅ Cong Qin ⋅ Chaowen Hu
Abstract
Multi-turn dialogue alignment faces critical challenges where tokens and turns contribute heterogeneously to preference signals. Existing methods apply uniform token weighting or binary turn selection, overlooking fine-grained structures. We present \textbf{T$^3$PO}, a dual-granularity framework incorporating: (i) token-level temporal discounting prioritizing early high-signal tokens with provable partition function cancellation; (ii) turn-level self-evaluated weighting via multi-perspective scoring, eliminating external dependencies. Experiments across multiple benchmarks and model scales demonstrate consistent improvements over baselines, with ablations confirming independent contributions from both mechanisms.
Chat is not available.
Successful Page Load