Poster
in
Affinity Workshop: Blog Track Session 8
Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle
Rylan Schaeffer · Zachary Robertson · Akhilan Boopathy · Mikail Khona · Kateryna Pistunova · Jason Rocks · Ila Fiete · Andrey Gromov · Sanmi Koyejo
Halle B #2
Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of many large models in machine learning. In this work, we analytically dissect the simple setting of ordinary linear regression, and show intuitively and rigorously when and why double descent occurs, without complex tools (e.g., statistical mechanics, random matrix theory). We identify three interpretable factors that, when simultaneously all present, cause double descent: (1) How much the training features vary in each direction; (2) How much, and in which directions, the test features vary relative to the training features; (3) How well the best possible model in the model class can correlate the variance in the training features with the training targets. We demonstrate on real data that ordinary linear regression exhibits double descent, and that double descent disappears when we ablate any one of the three identified factors. We conclude by using our fresh perspective to shed light on recent observations in nonlinear models concerning superposition and double descent.