ICLR Poster Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale

Poster

Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale

Alaa Khaddaj · Logan Engstrom · Aleksander Madry

Hall 3 + Hall 2B #565

[ Abstract ]

Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

Choice of training data distribution greatly influences model behavior. Yet, inlarge-scale settings, precisely characterizing how changes in trainingdata affects predictions is often difficult due to model training costs. Currentpractice is to instead extrapolate from scaled down, inexpensive-to-train proxymodels. However, changes in data do not influence smaller and larger modelsidentically. Therefore, understanding how choice of data affects large-scalemodels raises the question: how does training data distribution influence modelbehavior across compute scale? We find that small- and large-scale languagemodel predictions (generally) do highly correlate across choice oftraining data. Equipped with these findings, we characterize how proxy scaleaffects effectiveness in two downstream proxy model applications: dataattribution and dataset selection.

Live content is unavailable. Log in and register to view live content