Skip to yearly menu bar Skip to main content


GridWM-Judge: Evaluating Vision-Language Model Judges in Grid Worlds via World Model Deficits

Qinan Zhang ⋅ Qihang Jin

Abstract

Chat is not available.