Sat 12:00 a.m. - 12:15 a.m.
|
Introduction and opening remarks
(
Intro
)
>
SlidesLive Video
|
🔗
|
Sat 12:15 a.m. - 12:45 a.m.
|
Invited Talk 1: Theory on Training Dynamics of Transformers
(
Invited Talk
)
>
SlidesLive Video
|
Yingbin Liang
🔗
|
Sat 12:45 a.m. - 1:00 a.m.
|
Contributed Talk 1: Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
(
Oral
)
>
link
SlidesLive Video
|
Dayal Singh Kalra · Tianyu He · Maissam Barkeshli
🔗
|
Sat 1:00 a.m. - 2:00 a.m.
|
Poster Session 1 and Coffee Break
(
Poster Session
)
>
|
🔗
|
Sat 2:00 a.m. - 2:15 a.m.
|
Contributed Talk 2: Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
(
Oral
)
>
link
SlidesLive Video
|
Libin Zhu · Chaoyue Liu · Adityanarayanan Radhakrishnan · Misha Belkin
🔗
|
Sat 2:15 a.m. - 2:30 a.m.
|
Contributed Talk 3: What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
(
Oral
)
>
link
SlidesLive Video
|
Xingwu Chen · Difan Zou
🔗
|
Sat 2:30 a.m. - 2:45 a.m.
|
Contributed Talk 4: Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
(
Oral
)
>
link
SlidesLive Video
|
Yuchen Li · Alexandre Kirchmeyer · Aashay Mehta · Yilong Qin · Boris Dadachev · Kishore Papineni · Sanjiv Kumar · Andrej Risteski
🔗
|
Sat 2:45 a.m. - 3:00 a.m.
|
Contributed Talk 5: Interpreting Grokked Transformers in Complex Modular Arithmetic
(
Oral
)
>
link
SlidesLive Video
|
Hiroki Furuta · Gouki Minegishi · Yusuke Iwasawa · Yutaka Matsuo
🔗
|
Sat 3:00 a.m. - 5:00 a.m.
|
Poster Session 2 and Lunch Break
(
Poster Session
)
>
|
🔗
|
Sat 5:00 a.m. - 5:30 a.m.
|
Invited Talk 2 : Transformers learn in-context by implementing gradient descent
(
Invited Talk
)
>
SlidesLive Video
|
Suvrit Sra
🔗
|
Sat 5:30 a.m. - 6:00 a.m.
|
Invited Talk 3 : Knowledge Distillation as Semiparametric Inference
(
Invited Talk
)
>
SlidesLive Video
|
Lester Mackey
🔗
|
Sat 6:00 a.m. - 6:30 a.m.
|
Invited Talk 4 : Emergence of unexpected complex skills in LLMs: Some theory and experiments
(
Invited Talk
)
>
SlidesLive Video
|
🔗
|
Sat 6:30 a.m. - 7:00 a.m.
|
Poster Session 3 and Coffee Break
(
Poster Session
)
>
|
🔗
|
Sat 7:00 a.m. - 7:30 a.m.
|
Invited Talk 5 : Capitalizing on Generative AI: Diffusion Models Towards High-Dimensional Generative Optimization
(
Invited Talk
)
>
SlidesLive Video
|
Mengdi Wang
🔗
|
-
|
ResNet-Induced Convex Integration Model: a Novel Robust Framework through the Regularity of Optimal Transport
(
Poster
)
>
link
|
Kuo Gai · Sicong Wang · Shihua Zhang
🔗
|
-
|
How Can Graph Neural Networks Learn to Perform Latent Representation Inference?
(
Poster
)
>
link
|
Hongyi Guo · Zhaoran Wang
🔗
|
-
|
A resource model for neural scaling laws
(
Poster
)
>
link
|
Jinyeop Song · Ziming Liu · Max Tegmark · Jeff Gore
🔗
|
-
|
Implicit Bias and Fast Convergence Rates for Self-attention
(
Poster
)
>
link
|
Bhavya Vasudeva · Puneesh Deora · Christos Thrampoulidis
🔗
|
-
|
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
(
Poster
)
>
link
|
Juno Kim · Taiji Suzuki
🔗
|
-
|
Generalization Bounds for Magnitude Based Pruning
(
Poster
)
>
link
|
Etash Guha · Prasanjit Dubey · Xiaoming Huo
🔗
|
-
|
Fundamental Benefit of Alternating Updates in Minimax Optimization
(
Poster
)
>
link
|
Jaewook Lee · Hanseul Cho · Chulhee Yun
🔗
|
-
|
GNN-VPA: A Variance-Preserving Aggregation Strategy for Graph Neural Networks
(
Poster
)
>
link
|
Lisa Schneckenreiter · Richard Freinschlag · Florian Sestak · Johannes Brandstetter · Günter Klambauer · Andreas Mayr
🔗
|
-
|
Implicit regularization of multi-task learning and finetuning in overparameterized neural networks
(
Poster
)
>
link
|
Jack Lindsey · Samuel Lippl
🔗
|
-
|
Long-Range Synthetic Knowledge Graph Benchmarks for Double-Equivariant Models
(
Poster
)
>
link
|
Bruna Jasinowodolinski · Yucheng Zhang · Jincheng Zhou · Beatrice Bevilacqua · Bruno Ribeiro
🔗
|
-
|
Meta Prompting for AGI Systems
(
Poster
)
>
link
|
Yifan Zhang · Yang Yuan · Andrew Yao
🔗
|
-
|
Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay
(
Poster
)
>
link
|
Leyan Pan · Xinyuan Cao
🔗
|
-
|
Momentum Gradient Descent over single-neuron linear network: rich behaviors of limiting Sharpness
(
Poster
)
>
link
|
WenJie Zhou · Bohan Wang · Wei Chen · Zhi-Ming Ma · Xueqi Cheng
🔗
|
-
|
Implicit Bias of AdamW: $\ell_\infty$-Norm Constrained Optimization
(
Poster
)
>
link
|
Shuo Xie · Zhiyuan Li
🔗
|
-
|
On the Diminishing Returns of Width for Continual Learning
(
Poster
)
>
link
|
Etash Guha · Vihan Lakshman
🔗
|
-
|
Distributed Reward-Free Exploration: A Provably Efficient Policy Optimization Algorithm
(
Poster
)
>
link
|
Hongyi Guo · Zhuoran Yang · Zhaoran Wang
🔗
|
-
|
Conformal Prediction Sets Improve Human Decision Making
(
Poster
)
>
link
|
Jesse Cresswell · Yi Sui · Bhargava Kumar · Noël Vouitsis
🔗
|
-
|
Anchor Function for Studying Language Models
(
Poster
)
>
link
|
Zhongwang Zhang · Zhiwei Wang · Zhiqin Xu
🔗
|
-
|
No Free Prune: Information-Theoretic Barriers to Pruning at Initialization
(
Poster
)
>
link
|
Kevin Luo · Tanishq Kumar · Mark Sellke
🔗
|
-
|
CatCode: A Comprehensive Evaluation Framework for LLMs On the Mixture of Code and Text
(
Poster
)
>
link
|
Zhenru Lin · Yiqun Yao · Yang Yuan
🔗
|
-
|
Understanding multimodal contrastive learning through pointwise mutual information
(
Poster
)
>
link
|
Toshimitsu Uesaka · Taiji Suzuki · Yuhta Takida · Chieh-Hsin Lai · Naoki Murata · Yuki Mitsufuji
🔗
|
-
|
Optimizing for ROC Curves on Class-Imbalanced Data by Training over a Family of Loss Functions
(
Poster
)
>
link
|
Kelsey Lieberman · Shuai Yuan · Swarna Ravindran · Carlo Tomasi
🔗
|
-
|
The best algorithm for adversarial training
(
Poster
)
>
link
|
Nikolaos Tsilivis · Natalie Frank · Julia Kempe
🔗
|
-
|
Understanding Sub-domain Alignment for Domain Adaptation
(
Poster
)
>
link
|
Yiling Liu · Juncheng Dong · Ziyang Jiang · Ahmed Aloui · Keyu Li · Michael Klein · VAHID TAROKH · David Carlson
🔗
|
-
|
Binding Dynamics in Rotating Features
(
Poster
)
>
link
|
Sindy Löwe · Francesco Locatello · Max Welling
🔗
|
-
|
Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?
(
Poster
)
>
link
|
Qiyao Liang · Ziming Liu · Ila Fiete
🔗
|
-
|
Robust Sparse Mean Estimation via Incremental Learning
(
Poster
)
>
link
|
Jianhao Ma · Rui Chen · Yinghui He · Salar Fattahi · Wei Hu
🔗
|
-
|
Towards a Theoretical Understanding of Model Collapse
(
Poster
)
>
link
|
Elvis Dohmatob · Yunzhen Feng · Julia Kempe
🔗
|
-
|
Towards Understanding Clean Generalization and Robust Overfitting in Adversarial Training
(
Poster
)
>
link
|
Binghui Li · Yuanzhi Li
🔗
|
-
|
Simplicity Bias of Transformers to Learn Low Sensitivity Functions
(
Poster
)
>
link
|
Bhavya Vasudeva · Deqing Fu · Tianyi Zhou · Elliott Kau · Youqi Huang · Vatsal Sharan
🔗
|
-
|
HyperAgent: A Simple, Scalable, Efficient and Provable Reinforcement Learning Framework for Complex Environments
(
Poster
)
>
link
|
Yingru Li · Jiawei Xu · Lei Han · Zhi-Quan Luo
🔗
|
-
|
Weisfeiler–Leman at the margin: When more expressivity matters
(
Poster
)
>
link
|
Billy Franks · Christopher Morris · Ameya Velingker · Floris Geerts
🔗
|
-
|
On the Relationship Between Small Initialization and Flatness in Deep Networks
(
Poster
)
>
link
|
Soo Min Kwon · Lijun Ding · Laura Balzano · Qing Qu
🔗
|
-
|
Bridging Lottery ticket and Grokking: Is Weight Norm Sufficient to Explain Delayed Generalization?
(
Poster
)
>
link
|
Gouki Minegishi · Yusuke Iwasawa · Yutaka Matsuo
🔗
|
-
|
Optimization Effectiveness versus Generalization Capability of Stochastic Optimization Algorithms for Deep Learning
(
Poster
)
>
link
|
TOKI TAHMID INAN · Mingrui Liu · Amarda Shehu
🔗
|
-
|
FROM GENERALIZATION ANALYSIS TO OPTIMIZATION DESIGNS FOR STATE SPACE MODELS
(
Poster
)
>
link
|
Fusheng Liu · Qianxiao Li
🔗
|
-
|
Cumulative Reasoning with Large Language Models
(
Poster
)
>
link
|
Yifan Zhang · Jingqin Yang · Yang Yuan · Andrew Yao
🔗
|
-
|
Counting on Algorithmic Capacity: The Interplay between Mixing and Memorization in Toy Models of Transformers
(
Poster
)
>
link
|
Freya Behrens · Luca Biggio · Lenka Zdeborova
🔗
|
-
|
Measuring Sharpness in Grokking
(
Poster
)
>
link
|
Jack Miller · Patrick Gleeson · Noam Levi · Charles O'Neill · Thang Bui
🔗
|
-
|
Low-Rank Robust Graph Contrastive Learning
(
Poster
)
>
link
|
Yancheng Wang · Yingzhen Yang
🔗
|
-
|
Active Few-Shot Fine-Tuning
(
Poster
)
>
link
|
Jonas Hübotter · Bhavya · Lenart Treven · Yarden As · Andreas Krause
🔗
|
-
|
On Different Faces of Model Scaling in Supervised and Self-Supervised Learning
(
Poster
)
>
link
|
Matteo Gamba · Arna Ghosh · Kumar Agrawal · Blake A Richards · Hossein Azizpour · Mårten Björkman
🔗
|
-
|
Breaking the Dimension Dependence in Sketching for Distributed Learning
(
Poster
)
>
link
|
Berivan Isik · Qiaobo Li · Mayank Shrivastava · Arindam Banerjee · Sanmi Koyejo
🔗
|
-
|
On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval
(
Poster
)
>
link
|
Kaiyue Wen · Xingyu Dang · Kaifeng Lyu
🔗
|
-
|
OMPO: A Unified Framework for Reinforcement Learning under Policy and Dynamics Shifts
(
Poster
)
>
link
|
Yu Luo · Tianying Ji · Fuchun Sun · Jianwei Zhang · Huazhe Xu · Xianyuan Zhan
🔗
|
-
|
Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations
(
Poster
)
>
link
|
13 presenters
Rylan Schaeffer · Berivan Isik · Dhruv Pai · Andres Carranza · Victor Lecomte · Alyssa Unell · Mikail Khona · Thomas Yerxa · Yann LeCun · SueYeon Chung · Andrey Gromov · Ravid Shwartz-Ziv · Sanmi Koyejo
🔗
|
-
|
On Learning Modular Polynomials
(
Poster
)
>
link
|
Darshil Doshi · Tianyu He · Aritra Das · Andrey Gromov
🔗
|
-
|
Towards Principled Graph Transformers
(
Poster
)
>
link
|
Luis Müller · Daniel Kusuma · Christopher Morris
🔗
|
-
|
How Uniform Random Weights Induce Non- uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
(
Poster
)
>
link
|
Gon Buzaglo · Itamar Harel · Mor Shpigel Nacson · Alon Brutzkus · Nathan Srebro · Daniel Soudry
🔗
|
-
|
ADOPT: Modified Adam Can Converge with the Optimal Rate with Any Hyperparameters
(
Poster
)
>
link
|
Shohei Taniguchi · Masahiro Suzuki · Yusuke Iwasawa · Yutaka Matsuo
🔗
|
-
|
Training Dynamics of Multi-Head Softmax Attention: Emergence, Convergence, and Optimality
(
Poster
)
>
link
|
Siyu Chen · Heejune Sheen · Zhuoran Yang · Tianhao Wang
🔗
|
-
|
Bridging Empirics and Theory: Unveiling Asymptotic Universality Across Gaussian and Gaussian Mixture Inputs in Deep Learning
(
Poster
)
>
link
|
Jaeyong Bae · Hawoong Jeong
🔗
|
-
|
Variational Linearized Laplace Approximation for Bayesian Deep Learning
(
Poster
)
>
link
|
Luis A. Ortega · Simon Rodriguez Santana · Daniel Hernández-Lobato
🔗
|
-
|
What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes
(
Poster
)
>
link
|
Victor Lecomte · Kushal Thaman · Rylan Schaeffer · Naomi Bashkansky · Trevor Chow · Sanmi Koyejo
🔗
|
-
|
Addressing Sample Inefficiency in Multi-View Representation Learning
(
Poster
)
>
link
|
Arna Ghosh · Kumar Agrawal · Adam Oberman · Blake A Richards
🔗
|
-
|
Simulating the Implicit Effect of Learning Rates in Gradient Descent
(
Poster
)
>
link
|
Adrian Goldwaser · Bruno Mlodozeniec · Hong Ge
🔗
|
-
|
Provably Efficient Maximum Entropy Pure Exploration in Reinforcement Learning
(
Poster
)
>
link
|
Hongyi Guo · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang
🔗
|
-
|
Weisfeiler and Leman Go Loopy: A New Hierarchy for Graph Representational Learning
(
Poster
)
>
link
|
Raffaele Paolino · Sohir Maskey · Pascal Welke · Gitta Kutyniok
🔗
|
-
|
Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning
(
Poster
)
>
link
|
Alexandru Meterez · Lorenzo Noci · Thomas Hofmann · Antonio Orvieto
🔗
|
-
|
TRAVERSING CHEMICAL SPACE WITH LATENT POTENTIAL FLOWS
(
Poster
)
>
link
|
Guanghao Wei · Yining Huang · Chenru Duan · Yue Song · Yuanqi Du
🔗
|
-
|
Stability Analysis of Various Symbolic Rule Extraction Methods from Recurrent Neural Network
(
Poster
)
>
link
|
Neisarg Dave · Daniel Kifer · C. Giles · Ankur Mali
🔗
|
-
|
Scaling laws and Zipf's law in AlphaZero
(
Poster
)
>
link
|
Oren Neumann · Claudius Gros
🔗
|
-
|
GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory
(
Poster
)
>
link
|
David Baek · Ziming Liu · Max Tegmark
🔗
|
-
|
On the Theory of Skill-based Reinforcement Learning: Distribution Recovery and Generalization
(
Poster
)
>
link
|
Hongyi Guo · Xiaoyu Chen · Sirui Zheng · Zhuoran Yang · Zhaoran Wang
🔗
|
-
|
A Coefficient Makes SVRG Effective
(
Poster
)
>
link
|
Yida Yin · Zhiqiu Xu · Zhiyuan Li · trevor darrell · Zhuang Liu
🔗
|
-
|
On the Theory of Cross-Modality Distillation with Contrastive Learning
(
Poster
)
>
link
|
Hangyu Lin · Chen Liu · Chengming Xu · Zhengqi Gao · Yanwei Fu · Yuan Yao
🔗
|
-
|
Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems
(
Poster
)
>
link
|
Junwei Su · Difan Zou · Chuan Wu
🔗
|
-
|
Implicit Regularization of Gradient Flow for One-layer Softmax Attention
(
Poster
)
>
link
|
Heejune Sheen · Siyu Chen · Tianhao Wang · Huibin Zhou
🔗
|
-
|
In-context Newton’s method for regression: Transformers can provably converge
(
Poster
)
>
link
|
Angeliki Giannou · Liu Yang · Tianhao Wang · Dimitris Papailiopoulos · Jason Lee
🔗
|
-
|
PAC-Chernoff Bounds: Understanding Generalization in the Interpolation Regime
(
Poster
)
>
link
|
Andres Masegosa · Luis A. Ortega
🔗
|
-
|
Is robust overfitting inevitable? - An approximation viewpoint
(
Poster
)
>
link
|
Zhongjie Shi · Fanghui Liu · Yuan Cao · Johan Suykens
🔗
|
-
|
Stochastic restarting to overcome overfitting in neural networks
(
Poster
)
>
link
|
Yeongwoo Song · Youngkyoung Bae · Hawoong Jeong
🔗
|
-
|
Analytical Solution of Three-layer Network with Matrix Exponential Activation Function
(
Poster
)
>
link
|
Kuo Gai · Shihua Zhang
🔗
|