ICLR Invited Talk 1: Theory on Training Dynamics of Transformers

Invited Talk
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

Invited Talk 1: Theory on Training Dynamics of Transformers

Yingbin Liang

[ Abstract ]

Abstract:

Speaker: Yingbin Liang

Transformers, as foundation models, have recently revolutionized many machine learning (ML) applications such as natural language processing, computer vision, robotics, etc. Alongside their tremendous experimental successes, there arises a compelling inquiry into the theoretical foundations of the training dynamics of transformer-based ML models; particularly, why transformers trained by the common routine of gradient descent can achieve desired performance. In this talk, I will present our recent results along this direction on two case studies: linear regression in in-context learning and masked image modeling in self-supervised learning. For both problems, we analyze the convergence of the training process over one-layer transformers and characterize the optimality of the attention models upon convergence. Our numerical results further corroborate these theoretical insights. Lastly, I will discuss future directions and open problems in this actively evolving field.

Chat is not available.

Invited Talk in Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

Invited Talk 1: Theory on Training Dynamics of Transformers

Yingbin Liang

Invited Talk
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning