Invited Talk
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning
Invited Talk 1: Theory on Training Dynamics of Transformers
Yingbin Liang
Speaker: Yingbin Liang
Transformers, as foundation models, have recently revolutionized many machine learning (ML) applications such as natural language processing, computer vision, robotics, etc. Alongside their tremendous experimental successes, there arises a compelling inquiry into the theoretical foundations of the training dynamics of transformer-based ML models; particularly, why transformers trained by the common routine of gradient descent can achieve desired performance. In this talk, I will present our recent results along this direction on two case studies: linear regression in in-context learning and masked image modeling in self-supervised learning. For both problems, we analyze the convergence of the training process over one-layer transformers and characterize the optimality of the attention models upon convergence. Our numerical results further corroborate these theoretical insights. Lastly, I will discuss future directions and open problems in this actively evolving field.