ICLR Poster Boltzmann Semantic Score: A Semantic Metric for Evaluating Large Vision Models Using Large Language Models

Poster

Boltzmann Semantic Score: A Semantic Metric for Evaluating Large Vision Models Using Large Language Models

Ali Khajegili Mirabadi · Katherine Rich · Hossein Farahani · Ali Bashashati

Hall 3 + Hall 2B #14

[ Abstract ]

Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

Do Large Vision Models (LVMs) extract medically and semantically relevant features similar to those identified by human experts? Currently, only biased, qualitative approaches with limited, small-scale expert evaluations are available to answer this question. In this study, we propose the Boltzmann Semantic Score (BSS), a novel method inspired by state space modeling, to evaluate the encoding space of LVMs from medical images using the encoding space of Large Language Models (LLMs) from medical reports. Through extensive experimentation on 32 datasets from The Cancer Genome Atlas collection using five state-of-the-art LLMs, we first establish a baseline of LLMs' performance in digital pathology and show that LLMs' encoding can be linked to patient outcomes. Then, we compared seven LVMs with BSS and showed that LVMs suffer from poor semantic capability when compared with encoded expert knowledge from pathology reports.We also found statistically significant correlations between BSS (as a measure of structural similarity) and performance in two downstream tasks: information retrieval and survival prediction tasks. Our study also investigates the consensus among LLMs in evaluating LVMs using BSS, indicating that LLMs generally reach substantial consensus in rating LVMs, with some variation dependant on the cancer type. We believe the BSS metric proposed here holds significant potential for application in other domains with similar contexts. Data and code can be found in \footnotesize \url{ https://github.com/AIMLab-UBC/Boltzmann}

Live content is unavailable. Log in and register to view live content