Skip to yearly menu bar Skip to main content


Poster Thu, Apr 23, 2026 • 11:15 AM – 1:45 PM PDT Pavilion 3 P3-#1921

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Yanghao Li ⋅ Rui Qian ⋅ Bowen Pan ⋅ Haotian Zhang ⋅ Haoshuo Huang ⋅ Bowen Zhang ⋅ Jialing Tong ⋅ Haoxuan You ⋅ Xianzhi Du ⋅ Zhe Gan ⋅ Hyunjik Kim ⋅ Chao Jia ⋅ Zhenbang Wang ⋅ Yinfei Yang ⋅ Mingfei Gao ⋅ Zi-Yi Dou ⋅ Wenze Hu ⋅ Chang Gao ⋅ Dongxu Li ⋅ Philipp Dufter ⋅ Zirui Wang ⋅ Guoli Yin ⋅ Zhengdong Zhang ⋅ Chen Chen ⋅ Yang Zhao ⋅ Ruoming Pang ⋅ Zhifeng Chen

Abstract

Log in and register to view live content