Skip to yearly menu bar Skip to main content


Poster
in
Workshop: First Workshop on Representational Alignment (Re-Align)

Human and Deep Neural Network Alignment in Navigational Affordance Perception

Clemens Bartnik · Iris Groen

Keywords: [ navigatinal affordances ] [ scene perception ] [ DNNs ] [ path drawings ]


Abstract:

Moving through the world requires extracting navigational affordances from the immediate visual environment. How do humans compute this information from visual inputs? Over the last decade, Deep Neural Networks (DNNs) trained on visual recognition tasks have been shown to predict human perception remarkably well in the domain of object recognition, but their alignment with humans in other visual task contexts, such as spatial navigation, remain less understood. Here, we investigated the alignment of DNNs with human-perceived navigational affordances in a broad variety of visual environments by using explainable AI and different model training objectives. We curated a diverse set of naturalistic real-world indoor, outdoor man-made, and outdoor natural scenes. For each scene, we gathered human annotations identifying the objects present and collected drawings of path trajectories that participants would take through each scene. Quantitative analysis of the path annotations highlights that participants perceive and choose similar paths in each environment and thus diagnostic features for navigational affordances are present in the images. Using representational similarity analysis, we discovered that DNN features exhibit low correlations with information relevant to navigational affordance, such as mean pathways and floor segmentation. They demonstrate slightly better correlations with estimated depth information. However, these correlations are substantially lower than with the representational space of the contained objects. These findings illustrate that DNNs represent object information rather than representations of navigational affordances. This highlights that our path annotations are a rich and challenging benchmark to study human-DNN alignment and that current commonly used DNNs are yet not capturing navigational affordance representations well.

Chat is not available.