ICLR Poster How do we interpret the outputs of a neural network trained on classification?

Poster

How do we interpret the outputs of a neural network trained on classification?

Yudi Xie

Hall 3 + Hall 2B #335

[ Abstract ] [ Project Page ]

Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

Deep neural networks are widely used for classification tasks, but the interpretation of their output activations is often unclear. This post explains how these outputs can be understood as approximations of the Bayesian posterior probability. We showed that, in theory, the loss function for classification tasks -- derived by maximum likelihood -- is minimized by the Bayesian posterior. We conducted empirical studies training neural networks to classify synthetic data from a known generative model. In a simple classification task, the network closely approximates the theoretically derived posterior. However, simple changes in the task can make accurate approximation much more difficult. The model's ability to approximate the posterior depends on multiple factors, such as the complexity of the posterior and whether there is sufficient data for learning.

Live content is unavailable. Log in and register to view live content