Depth separation and weight-width trade-offs for sigmoidal neural networks

Workshop

Depth separation and weight-width trade-offs for sigmoidal neural networks

Amit Jayant Deshpande · Navin Goyal ·

East Meeting Level 8 + 15 #7

Wed 2 May, 11 a.m. PDT

[ Abstract ]

[ PDF]

Recent work has shown strong separation between the expressive power of depth-

2

$2$ and depth-

3

$3$ neural networks. These separation results exhibit a function and an input distributions, so that the function is well-approximable in

L_{2}

$L_{2}$ -norm on the input distribution by a depth-

3

$3$ neural network of polynomial size but any depth-

2

$2$ neural network that well-approximates it requires exponential size. A limitations of these results is that they work only for certain careful choices of functions and input distributions that are arguably not natural enough. We provide a simple proof of

L_{2}

$L_{2}$ -norm separation between the expressive power of depth-

2

$2$ and depth-

3

$3$ sigmoidal neural networks for a large class of input distributions, assuming their weights are polynomially bounded. Our proof is simpler than previous results, uses known low-degree multivariate polynomial approximations to neural networks, and gives the first depth-

2

$2$ -vs-depth-

3

$3$ separation that works for a large class of input distributions.

Live content is unavailable. Log in and register to view live content