Poster
Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data
Atsushi Nitanda · Kazusato Oko · Taiji Suzuki · Denny Wu
Halle B #222
Abstract:
Recent works have shown that neural networks optimized by gradient-based methods can adapt to sparse or low-dimensional target functions through feature learning; an often studied target is the sparse parity function on the unit hypercube. However, such isotropic data setting does not capture the anisotropy and low intrinsic dimensionality exhibited in realistic datasets. In this work, we address this shortcoming by studying how gradient-based feature learning interacts with structured (anisotropic) input data: we consider the classification of $k$-sparse parity on high-dimensional orthotope where the feature coordinates have varying magnitudes, and analyze the learning complexity of the mean-field Langevin dynamics (MFLD), which describes the noisy gradient descent update on two-layer neural network. We show that the statistical complexity (i.e. sample size) and computational complexity (i.e. network width) of MFLD can both be improved when prominent directions of the anisotropic input data align with the support of the target function. Moreover, by employing a coordinate transform determined by the gradient covariance, the width can be made independent of the target degree $k$. Lastly, we demonstrate the benefit of feature learning by establishing a kernel lower bound on the classification error, which applies to neural networks in the lazy regime.
Chat is not available.