What Does a Visual Formal Analysis of the World's 500 Most Famous Paintings Tell Us About Multimodal LLMs?

Muzi Tao · Saining Xie


This work introduces ArtQA, a new benchmark for multimodal LLMs through the lens of formal analysis of paintings. We focus on key elements such as line, shape, space, color, form, value, and texture—collectively referred to as the elements of art in visual formal analysis. ArtQA contains questions spanning 4 metrics, further divided into 16 fine-grained categories. We leverage the power of LLMs to generate VQA questions based on formal analysis of 500 renowned paintings. These questions undergo a rigorous filtering process by both model annotation and human experts, ensuring ArtQA's quality and reliability.

