Skip to yearly menu bar Skip to main content


Image Clustering Conditioned on Text Criteria

Sehyun Kwon · Jaden Park · Minkyu Kim · Jaewoong Cho · Ernest K Ryu · Kangwook Lee

Halle B #45
[ ]
Thu 9 May 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract: Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for performing image clustering based on user-specified criteria in the form of text by leveraging modern Vision-Language Models and Large Language Models. We call our method Image Clustering Conditioned on Text Criteria (IC$|$TC), and it represents a different paradigm of image clustering. IC$|$TC requires a minimal and practical degree of human intervention and grants the user significant control over the clustering results in return. Our experiments show that IC$|$TC can effectively cluster images with various criteria, such as human action, physical location, or the person's mood, significantly outperforming baselines.

Chat is not available.