Skip to yearly menu bar Skip to main content


DC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological Reasoning

Chi-Min Chan ⋅ Ehsan Hajiramezanali ⋅ Xiner Li ⋅ Edward De Brouwer ⋅ Carl Edwards ⋅ Wei Xue ⋅ Sirui Han ⋅ Yike Guo ⋅ Gabriele Scalia

Abstract

Video

Chat is not available.