Skip to yearly menu bar Skip to main content


Poster

Learning High-Degree Parities: The Crucial Role of the Initialization

Emmanuel Abbe · Elisabetta Cornacchia · Jan Hązła · Donald Kougang Yombi

Hall 3 + Hall 2B #438
[ ]
Thu 24 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract: Parities have become a standard benchmark for evaluating learning algorithms. Recent works show that regular neural networks trained by gradient descent can efficiently learn degree kk parities on uniform inputs for constant kk, but fail to do so when kk and dkdk grow with dd (here dd is the ambient dimension). However, the case where k=dOd(1)k=dOd(1), including the degree dd parity (the full parity), has remained unsettled. This paper shows that for gradient descent on regular neural networks, learnability depends on the initial weight distribution. On one hand, the discrete Rademacher initialization enables efficient learning of almost-full parities, while on the other hand, its Gaussian perturbation with large enough constant standard deviation σσ prevents it. The positive result for almost-full parities is shown to hold up to σ=O(d1)σ=O(d1), pointing to questions about a sharper threshold phenomenon. Unlike statistical query (SQ) learning, where a singleton function class like the full parity is trivially learnable, our negative result applies to a fixed function and relies on an initial gradient alignment}measure of potential broader relevance to neural networks learning.

Live content is unavailable. Log in and register to view live content