ICLR Poster Learning Hierarchical Polynomials of Multiple Nonlinear Features

Poster

Learning Hierarchical Polynomials of Multiple Nonlinear Features

Hengyu Fu · Zihao Wang · Eshaan Nichani · Jason Lee

Hall 3 + Hall 2B #340

[ Abstract ]

Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract: In deep learning theory, a critical question is to understand how neural networks learn hierarchical features. In this work, we study the learning of hierarchical polynomials of multiple nonlinear features using three-layer neural networks. We examine a broad class of functions of the form

f^{⋆} = g^{⋆} \circ p

$f^{\star}=g^{\star}\circ \mathbf{p}$ , where

$\mathbf{p}:\mathbb{R}^{d} \rightarrow \mathbb{R}^{r}$ represents multiple quadratic features with

$r \ll d$ and

$g^{\star}:\mathbb{R}^{r}\rightarrow \mathbb{R}$ is a polynomial of degree

$p$ . This can be viewed as a nonlinear generalization of the multi-index model, and also an expansion upon previous work on nonlinear feature learning that focused only on a single feature (i.e.

$r = 1$ ). Our primary contribution shows that a three-layer neural network trained via layerwise gradient descent suffices for - complete recovery of the space spanned by the nonlinear features - efficient learning of the target function

$f^{\star}=g^{\star}\circ \mathbf{p}$ or transfer learning of

$f=g\circ \mathbf{p}$ with a different link function within

$\widetilde{\mathcal{O}}(d^4)$ samples and polynomial time.For such hierarchical targets, our result substantially improves the sample complexity

${\Theta}(d^{2p})$ of the kernel methods, demonstrating the power of efficient feature learning. It is important to highlight that our results leverage novel techniques and thus manage to go beyond all prior settings such as single-index and multi-index models as well as models depending just on one nonlinear feature, contributing to a more comprehensive understanding of feature learning in deep learning.

Live content is unavailable. Log in and register to view live content