Skip to yearly menu bar Skip to main content


Poster

Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning

Haoran Dang · Cuiling Lan · Hai Wan · Xibin Zhao · Yan Lu

Abstract

Log in and register to view live content