ICLR Poster Deep Networks Learn Features From Local Discontinuities in the Label Function

Poster

Deep Networks Learn Features From Local Discontinuities in the Label Function

Prithaj Banerjee · Harish Ramaswamy · Mahesh Yadav · CHANDRA SHEKAR LAKSHMINARAYANAN

Hall 3 + Hall 2B #359

[ Abstract ] [ Project Page ]

Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

Deep neural networks outperform kernel machines on several datasets due to feature learning that happens during gradient descent training. In this paper, we analyze the mechanism through which feature learning happens and use a notion of features that corresponds to discontinuities in the true label function. We hypothesize that the core feature learning mechanism is label function discontinuities attracting model function discontinuities during training. To test this hypothesis, we perform experiments on classification data where the true label function is given by an oblique decision tree. This setup allows easy enumeration of label function discontinuities, while still remaining intractable for static kernel/linear methods. We then design/construct a novel deep architecture called a Deep Linearly Gated Network (DLGN), whose discontinuities in the input space can be easily enumerated. In this setup, we provide supporting evidence demonstrating the movement of model function discontinuities towards the label function discontinuities during training. The easy enumerability of discontinuities in the DLGN also enables greater mechanistic interpretability. We demonstrate this by extracting the parameters of a high-accuracy decision tree from the parameters of a DLGN. We also show that the DLGN is competitive with ReLU networks and other tree-learning algorithms on several real-world tabular datasets.

Live content is unavailable. Log in and register to view live content