Skip to yearly menu bar Skip to main content


Poster

Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models

Chenyang Zhang · Qingyue Zhao · Quanquan Gu · Yuan Cao

Abstract

Log in and register to view live content