Poster
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
Pritam Sarkar · Sayna Ebrahimi · Ali Etemad · Ahmad Beirami · Sercan Arik · Tomas Pfister
Hall 3 + Hall 2B #113
Despite their significant advancements, Multimodal Large Language Models(MLLMs) often generate factually inaccurate information, referred to as hallucination.In this work, we address object hallucinations in MLLMs, where informationis generated about an object not present in the input image. We introduce Data-augmentedPhrase-level Alignment (DPA), a novel loss which can be applied toinstruction-tuned off-the-shelf MLLMs to mitigate hallucinations, while preservingtheir general vision-language capabilities. To fine-tune MLLMs with DPA, we firstgenerate a set of 'hallucinated' and 'correct' response pairs through generative dataaugmentation by selectively altering the ground-truth information of the correctresponses at a phrase level. The DPA loss is then used to train MLLMs to reducethe likelihood of hallucinated phrases compared to the correct ones. Our thoroughevaluation on various benchmarks confirms the effectiveness of DPA in mitigatinghallucination while retaining the out-of-the-box performance of the MLLMs ongeneral tasks. For instance, MLLMs finetuned with DPA, which we refer to as HallucinationAttenuated Language and Vision Assistant (HALVA), improve F1 by upto 13.4% on hallucination visual question-answering and reduce the hallucinationrate by up to 4.2% on image description tasks.
Live content is unavailable. Log in and register to view live content