Recent work has shown strong separation between the expressive power of depth-2 and depth-3 neural networks. These separation results exhibit a function and an input distributions, so that the function is well-approximable in L2-norm on the input distribution by a depth-3 neural network of polynomial size but any depth-2 neural network that well-approximates it requires exponential size. A limitations of these results is that they work only for certain careful choices of functions and input distributions that are arguably not natural enough.
We provide a simple proof of L2-norm separation between the expressive power of depth-2 and depth-3 sigmoidal neural networks for a large class of input distributions, assuming their weights are polynomially bounded. Our proof is simpler than previous results, uses known low-degree multivariate polynomial approximations to neural networks, and gives the first depth-2-vs-depth-3 separation that works for a large class of input distributions.
Live content is unavailable. Log in and register to view live content