firstbacksecondback
6 Results
Workshop
|
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva · Puneesh Deora · Christos Thrampoulidis |
||
Poster
|
Thu 1:45 |
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? Tokio Kajitsuka · Issei Sato |
|
Poster
|
Thu 1:45 |
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention Arvind Mahankali · Tatsunori Hashimoto · Tengyu Ma |
|
Poster
|
Tue 7:30 |
Distinguished In Uniform: Self-Attention Vs. Virtual Nodes Eran Rosenbluth · Jan Tönshoff · Martin Ritzert · Berke Kisin · Martin Grohe |
|
Workshop
|
Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation Hui Wei · Maxwell Xu · Colin Samplawski · James Rehg · Santosh Kumar · Benjamin M Marlin |
||
Poster
|
Thu 7:30 |
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners Sarthak Yadav · Sergios Theodoridis · Lars Kai Hansen · Zheng-Hua Tan |