Amodal Completion of Occluded Objects in Deep Neural Networks: A Psychophysical Investigation
Abstract
Humans have a remarkable ability to complete the shape of an occluded object in a visual scene (Rensink & Enns, 1998). Standard feed-forward architectures of computer vision, however, suffer from diminished performance in classification tasks when objects are partially occluded, whereas humans perform well (Tang et al., 2018). This difference in the ability to classify an occluded object points to different internal mechanisms involved in both visual systems. We do not know the extent of amodal completion possessed by these models, or how human-like this ability is. We tested a neuroscience-inspired visual search model with a VGG16 backbone on four psychophysical tasks involving amodal completion, which are standard in human studies. We found that both a traditional feed-forward model trained on ImageNet, or a fine-tuned model with a recurrent layer, are not able to automatically complete occluded objects, suggesting that bridging the gap to natural vision requires more sophisticated mechanisms for extracting and integrating global structural features.