Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Navigating and Addressing Data Problems for Foundation Models (DPFM)

Toward Data-driven Skill Identification for General-purpose Vision-language Models

Anthony Tiong · Junqi Zhao · Junnan Li · Steven Hoi · Caiming Xiong · Boyang Albert Li

Keywords: [ data resources ] [ factor analysis ] [ vision-language models ] [ data analysis ]


Abstract:

The evolution of vision-language (VL) models toward broad competencies has complicated benchmarking, prompting the need for a set of diverse tasks for accurate evaluation. Moving beyond intuition-guided task selection common in existing benchmarks, we propose a data-driven approach that leverages transfer learning performance and factor analysis to identify latent skills crucial for VL tasks. Our study demonstrates the utility of factor analysis in guiding the systematic understanding and evaluation of vision-language models.

Chat is not available.