Assessing Generalization of SGD via Disagreement

Yiding Jiang · Vaishnavh Nagarajan · Christina Baek · Zico Kolter


Keywords: [ stochastic gradient descent ] [ generalization ] [ deep learning ]

[ Abstract ]
[ Visit Poster at Spot E1 in Virtual World ] [ Slides [ OpenReview
Wed 27 Apr 10:30 a.m. PDT — 12:30 p.m. PDT
Spotlight presentation:


We empirically show that the test error of deep networks can be estimated by training the same architecture on the same training set but with two different runs of Stochastic Gradient Descent (SGD), and then measuring the disagreement rate between the two networks on unlabeled test data. This builds on -- and is a stronger version of -- the observation in Nakkiran&Bansal 20, which requires the runs to be on separate training sets. We further theoretically show that this peculiar phenomenon arises from the well-calibrated nature of ensembles of SGD-trained models. This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.

Chat is not available.