ICLR 2024

Skip to yearly menu bar Skip to main content

6 Results

Workshop		Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva · Puneesh Deora · Christos Thrampoulidis
Poster	Thu 1:45	Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? Tokio Kajitsuka · Issei Sato
Poster	Thu 1:45	One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention Arvind Mahankali · Tatsunori Hashimoto · Tengyu Ma
Poster	Tue 7:30	Distinguished In Uniform: Self-Attention Vs. Virtual Nodes Eran Rosenbluth · Jan Tönshoff · Martin Ritzert · Berke Kisin · Martin Grohe
Workshop		Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation Hui Wei · Maxwell Xu · Colin Samplawski · James Rehg · Santosh Kumar · Benjamin M Marlin
Poster	Thu 7:30	Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners Sarthak Yadav · Sergios Theodoridis · Lars Kai Hansen · Zheng-Hua Tan