Spotlight
|
Mon 12:15
|
On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers
Kenji Kawaguchi
|
|
Poster
|
Mon 9:00
|
Multi-Level Local SGD: Distributed SGD for Heterogeneous Hierarchical Networks
Timothy Castiglia · Anirban Das · Stacy Patterson
|
|
Poster
|
Mon 17:00
|
When does preconditioning help or hurt generalization?
Shun-ichi Amari · Jimmy Ba · Roger Grosse · Xuechen Li · Atsushi Nitanda · Taiji Suzuki · Denny Wu · Ji Xu
|
|
Poster
|
Tue 17:00
|
A unifying view on implicit bias in training linear neural networks
Chulhee Yun · Shankar Krishnan · Hossein Mobahi
|
|
Poster
|
Thu 17:00
|
How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?
Zixiang Chen · Yuan Cao · Difan Zou · Quanquan Gu
|
|
Poster
|
Wed 1:00
|
Byzantine-Resilient Non-Convex Stochastic Gradient Descent
Zeyuan Allen-Zhu · Faeze Ebrahimianghazani · Jerry Li · Dan Alistarh
|
|
Spotlight
|
Wed 5:15
|
Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods
Taiji Suzuki · Akiyama Shunta
|
|
Poster
|
Mon 17:00
|
Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods
Taiji Suzuki · Akiyama Shunta
|
|
Poster
|
Tue 1:00
|
Computational Separation Between Convolutional and Fully-Connected Networks
Eran Malach · Shai Shalev-Shwartz
|
|
Poster
|
Tue 1:00
|
Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent
El Mahdi El Mhamdi · Rachid Guerraoui · Sébastien Rouault
|
|
Poster
|
Wed 17:00
|
Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation
Tanner Fiez · Lillian J Ratliff
|
|
Poster
|
Thu 1:00
|
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie · Issei Sato · Masashi Sugiyama
|
|