Poster
Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale
Alaa Khaddaj · Logan Engstrom · Aleksander Madry
Hall 3 + Hall 2B #565
Choice of training data distribution greatly influences model behavior. Yet, inlarge-scale settings, precisely characterizing how changes in trainingdata affects predictions is often difficult due to model training costs. Currentpractice is to instead extrapolate from scaled down, inexpensive-to-train proxymodels. However, changes in data do not influence smaller and larger modelsidentically. Therefore, understanding how choice of data affects large-scalemodels raises the question: how does training data distribution influence modelbehavior across compute scale? We find that small- and large-scale languagemodel predictions (generally) do highly correlate across choice oftraining data. Equipped with these findings, we characterize how proxy scaleaffects effectiveness in two downstream proxy model applications: dataattribution and dataset selection.
Live content is unavailable. Log in and register to view live content