Skip to yearly menu bar Skip to main content


ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

Jieyu Zhang ⋅ Le Xue ⋅ Linxin Song ⋅ Jun Wang ⋅ Weikai Huang ⋅ Manli Shu ⋅ An Yan ⋅ Zixian Ma ⋅ Juan Carlos Niebles ⋅ silvio savarese ⋅ Caiming Xiong ⋅ Zeyuan Chen ⋅ Ranjay Krishna ⋅ Ran Xu

Abstract

Chat is not available.