Are VLM Identity Judgments Logically Consistent? Evaluating Symmetry, Chain-of-Thought, and Transitivity in Person Re-Identification
Abstract
Vision-language models (VLMs) are increasingly used for visual reasoning tasks, yet their logical consistency remains poorly understood. We investigate whether VLMs make logically consistent identity judgments in person re-identification (re-ID), a task requiring fine-grained visual comparison. We propose three tests grounded in basic logical properties: (1) symmetry - whether the judgment A is the same person as B'' is invariant to presentation order; (2) transitivity - whetherA = B'' and B = C'' impliesA = C''; and (3) chain-of-thought consistency - whether explicit reasoning improves logical coherence. We evaluate four open-source VLMs (Qwen2-VL-7B, MiniCPM-V, Llama-3.2-Vision, LLaVA-NeXT-7B) alongside a CLIP embedding baseline on Market-1501. Our results reveal that two of four VLMs exhibit degenerate behavior (always predicting DIFFERENT), while the non-degenerate models show 14--26\% symmetry violations and up to 38.5\% transitivity violations. Strikingly, we find an accuracy - consistency trade-off: the most accurate model (MiniCPM-V, 81.5\%) has the lowest symmetry rate (74\%), while the perfectly symmetric CLIP baseline achieves only 52.6\% accuracy. These findings highlight a fundamental gap between VLM accuracy and logical coherence.