Skip to yearly menu bar Skip to main content

In-Person Poster presentation / poster accept

GAMR: A Guided Attention Model for (visual) Reasoning

Mohit Vaishnav · Thomas Serre

MH1-2-3-4 #128

Keywords: [ external memory ] [ compositional learning ] [ out-of-distribution generalization ] [ visual routines ] [ zero shot generalization ] [ abstract visual reasoning ] [ Neuroscience and Cognitive Science ]

Abstract: Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes. Here, we present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning ($\textit{GAMR}$), which instantiates an active vision theory -- positing that the brain solves complex visual reasoning problems dynamically -- via sequences of attention shifts to select and route task-relevant visual information into memory. Experiments on an array of visual reasoning tasks and datasets demonstrate GAMR's ability to learn visual routines in a robust and sample-efficient manner. In addition, GAMR is shown to be capable of zero-shot generalization on completely novel reasoning tasks. Overall, our work provides computational support for cognitive theories that postulate the need for a critical interplay between attention and memory to dynamically maintain and manipulate task-relevant visual information to solve complex visual reasoning tasks.

Chat is not available.