Skip to yearly menu bar Skip to main content


Poster

Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning

Zhuoxu Huang · Mengxi Jia · Hao Sun · Jungong Han · Xuelong Li

Abstract

Log in and register to view live content