Skip to yearly menu bar Skip to main content


Poster

DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD

Xianbiao Qi · Marco Chen · Wenjie Xiao · Jiaquan Ye · Yelin He · Chun-Guang Li · Zhouchen Lin

Abstract

Log in and register to view live content