Skip to yearly menu bar Skip to main content


Poster

Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next

Zhulin Hu · Yan Ma · Jiadi Su · I-Chun Chern · Pengfei Liu


Abstract:

In this blog post, we explore the advancements and challenges in fine-tuning unified token-based large multimodal models, focusing on the Chameleon architecture and its fine-tuned variant, Anole. Released in 2024, these models exemplify a modern approach for integrating various data modalities through tokens, simplifying modal fusion and leveraging established techniques from large language models. The post details our research efforts to reveal what is important, what is mistaken, and what is worth exploring in future research during the fine-tuning process.

Live content is unavailable. Log in and register to view live content