Workshop
2nd Workshop on Mathematical and Empirical Understanding of Foundation Models
Sang Michael Xie · Ananya Kumar · Sewon Min · Sadhika Malladi · Lucio Dery · Aditi Raghunathan · Tengyu Ma · Percy Liang
Strauss 2
Sat 11 May, midnight PDT
Foundation models (FMs) have revolutionized machine learning research across domains. These models are trained on extensive, highly varied datasets and can be quickly adapted to solve many tasks of interest. FMs are extremely effective on language (e.g., GPT-3 [1], BERT [2], PaLM [3], LLaMa [17]), vision (e.g., SimCLR [4]), speech (e.g., Whisper), and multi-modal (e.g., CLIP [5], DALL-E [6]) inputs.However, understanding of FMs lags far behind their extraordinary performance. FMs are known for their surprising emergent capabilities, such as in-context learning [1], but rigorous characterization of such phenomena is sorely lacking. Recently, substantially smaller models (e.g., LLaMA [17]) have demonstrated performance comparable to or better than huge FMs from the previous generation (e.g, OPT [19]). These findings suggest that careful selection of data, training objectives, and adaptation methods can more effectively induce desirable properties in FMs. Development of such techniques can be accelerated through better understanding.This workshop aims to bring together researchers who work on developing an understanding of FMs, through either careful experimentation or theoretical work. Rigorous characterization of FMs can also contribute to the broader goal of mitigating undesirable behaviors. FMs are now broadly available to users, so misaligned models present real-world risk. We thus also welcome submissions of previously unpublished works that investigate how to better characterize biases in models and align them.
Schedule
Fri 11:50 p.m. - 12:00 a.m.
|
Opening remarks
(
Intro
)
>
SlidesLive Video |
🔗 |
Sat 12:00 a.m. - 12:40 a.m.
|
Invited Talk (Sasha Rush)
(
Invited Talk
)
>
SlidesLive Video |
Sasha Rush 🔗 |
Sat 12:40 a.m. - 1:20 a.m.
|
Invited Talk (Yuandong Tian)
(
Invited Talk
)
>
SlidesLive Video |
Yuandong Tian 🔗 |
Sat 1:20 a.m. - 2:00 a.m.
|
Invited Talk (Hannaneh Hajishirzi)
(
Invited Talk
)
>
SlidesLive Video |
Hannaneh Hajishirzi 🔗 |
Sat 2:00 a.m. - 3:00 a.m.
|
Spotlight Talks
(
Spotlight Talks
)
>
SlidesLive Video |
🔗 |
Sat 4:00 a.m. - 5:00 a.m.
|
Poster Session
(
Poster Session
)
>
|
🔗 |
Sat 5:00 a.m. - 6:00 a.m.
|
Panel Discussion (Sasha Rush, Yuandong Tian, Hannaneh Hajishirzi, Jacob Steinhardt, Aditi Raghunathan)
(
Panel
)
>
SlidesLive Video |
Aditi Raghunathan · Sasha Rush · Yuandong Tian · Hannaneh Hajishirzi · Jacob Steinhardt 🔗 |
Sat 6:00 a.m. - 6:40 a.m.
|
Invited Talk (Jacob Steinhardt)
(
Invited Talk
)
>
SlidesLive Video |
Jacob Steinhardt 🔗 |
Sat 6:40 a.m. - 7:20 a.m.
|
Invited Talk (Amir Globerson)
(
Invited Talk
)
>
SlidesLive Video |
Amir Globerson 🔗 |
-
|
Prompting a Pretrained Transformer Can Be a Universal Approximator ( Poster ) > link | Aleksandar Petrov · Adel Bibi · Philip Torr 🔗 |
-
|
Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks ( Poster ) > link | Jongho Park · Jaden Park · Zheyang Xiong · Nayoung Lee · Jaewoong Cho · Samet Oymak · Kangwook Lee · Dimitris Papailiopoulos 🔗 |
-
|
Robust CLIP: Unsupervised Adversarial Fine-tuning of Vision Embeddings for Robust Large Vision-Language Models ( Poster ) > link | Christian Schlarmann · Naman Singh · francesco croce · Matthias Hein 🔗 |
-
|
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint ( Poster ) > link | Wei Xiong · Hanze Dong · Chenlu Ye · Ziqi Wang · Han Zhong · Heng Ji · Nan Jiang · Tong Zhang 🔗 |
-
|
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization ( Poster ) > link | Elan Rosenfeld · Andrej Risteski 🔗 |
-
|
Editing Large Language Models: Problems, Methods, and Opportunities ( Poster ) > link | Yunzhi Yao · Peng Wang · Bozhong Tian · Siyuan Cheng · Zhoubo Li · Shumin Deng · Huajun Chen · Ningyu Zhang 🔗 |
-
|
Pre-training and In-context Learning IS Bayesian Inference a la De Finetti ( Poster ) > link | Naimeng Ye · Hanming Yang · Andrew Siah · Hongseok Namkoong 🔗 |
-
|
Linear Alignment of Vision-language Models for Image Captioning ( Poster ) > link | Fabian Paischer · Markus Hofmarcher · Sepp Hochreiter · Thomas Adler 🔗 |
-
|
Self-Supervised Open-Ended Classification with Small Visual Language Models ( Poster ) > link | Mohammad Mahdi Derakhshani · Ivona Najdenkoska · Cees G Snoek · Marcel Worring · Yuki Asano 🔗 |
-
|
Simple linear attention language models balance the recall-throughput tradeoff ( Poster ) > link | Simran Arora · Sabri Eyuboglu · Michael Zhang · Aman Timalsina · Silas Alberti · James Y Zou · Atri Rudra · Christopher Re 🔗 |
-
|
Best Arm Identification for Prompt Learning under a Limited Budget ( Poster ) > link | Chengshuai Shi · Kun Yang · Jing Yang · Cong Shen 🔗 |
-
|
Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study ( Poster ) > link | Jinze Zhao · Peihao Wang · Zhangyang Wang 🔗 |
-
|
Do Diffusion Models Learn Semantically Meaningful and Efficient Representations? ( Poster ) > link | Qiyao Liang · Ziming Liu · Ila Fiete 🔗 |
-
|
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications ( Poster ) > link | Boyi Wei · Kaixuan Huang · Yangsibo Huang · Tinghao Xie · Xiangyu Qi · Mengzhou Xia · Prateek Mittal · Mengdi Wang · Peter Henderson 🔗 |
-
|
Does Data Contamination Make a Difference? Insights from Intentionally Contamination Pre-training Data For Language Models ( Poster ) > link | Minhao Jiang · Ken Liu · Ming Zhong · Rylan Schaeffer · Siru Ouyang · Jiawei Han · Sanmi Koyejo 🔗 |
-
|
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once? ( Poster ) > link | Guijin Son · Sangwon Baek · Sangdae Nam · Ilgyun Jeong · Seungone Kim 🔗 |
-
|
QuRating: Selecting High-Quality Data for Training Language Models ( Poster ) > link | Alexander Wettig · Aatmik Gupta · Saumya Malik · Danqi Chen 🔗 |
-
|
On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval ( Poster ) > link | Kaiyue Wen · Xingyu Dang · Kaifeng Lyu 🔗 |
-
|
TOWARDS AN EMPIRICAL UNDERSTANDING OF MOE DESIGN CHOICES ( Poster ) > link | Dongyang Fan · Bettina Messmer · Martin Jaggi 🔗 |
-
|
What makes vision transformers robust towards bit-flip attack? ( Poster ) > link | Xuan Zhou · Souvik Kundu · Peter Beerel 🔗 |
-
|
On provable length and compositional generalization ( Poster ) > link | Kartik Ahuja · Amin Mansouri 🔗 |
-
|
Eliciting Latent Knowledge from Quirky Language Models ( Poster ) > link | Alex Mallen · Nora Belrose 🔗 |
-
|
The Effect of Model Capacity on the Emergence of In-Context Learning ( Poster ) > link | Berkan Ottlik · Narutatsu Ri · Daniel Hsu · Clayton Sanford 🔗 |
-
|
MathSensei: Mathematical Reasoning with a Tool-Augmented Large Language Model ( Poster ) > link | DEBRUP DAS · Debopriyo Banerjee · Somak Aditya · Ashish Kulkarni 🔗 |
-
|
Preserving Principal Subspaces to Reduce Catastrophic Forgetting in Fine-tuning ( Poster ) > link | Jörg Franke · Michael Hefenbrock · Frank Hutter 🔗 |
-
|
Transformers Learn Nonlinear Features In Context ( Poster ) > link | Juno Kim · Taiji Suzuki 🔗 |
-
|
Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations ( Poster ) > link |
13 presentersRylan Schaeffer · Berivan Isik · Dhruv Pai · Andres Carranza · Victor Lecomte · Alyssa Unell · Mikail Khona · Thomas Yerxa · Yann LeCun · SueYeon Chung · Andrey Gromov · Ravid Shwartz-Ziv · Sanmi Koyejo |
-
|
Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task ( Poster ) > link | Jannik Brinkmann · Abhay Sheshadri · Victor Levoso · Paul Swoboda · Christian Bartelt 🔗 |
-
|
Selecting Large Language Model to Fine-tune via Rectified Scaling Law ( Poster ) > link | Haowei Lin · Baizhou Huang · Haotian Ye · Qinyu Chen · Zihao Wang · Sujian Li · Jianzhu Ma · Xiaojun Wan · James Y Zou · Yitao Liang 🔗 |
-
|
Zero-Shot Recognition with Guided Cropping ( Poster ) > link | Piyapat Saranrittichai · Mauricio Munoz · Volker Fischer · Chaithanya Kumar Mummadi 🔗 |
-
|
Few-Shot Adaptation of Vision-Language Foundation Models via Dual-Path Inference ( Poster ) > link | Ce Zhang · Simon Stepputtis · Katia Sycara · Yaqi Xie 🔗 |
-
|
SparQ Attention: Bandwidth-Efficient LLM Inference ( Poster ) > link | Luka Ribar · Ivan Chelombiev · Luke Hudlass-Galley · Charles Blake · Carlo Luschi · Douglas Orr 🔗 |
-
|
Understanding and Improving In-Context Learning on Vision-language Models ( Poster ) > link | Shuo Chen · Zhen Han · Bailan He · Mark Buckley · Philip Torr · Volker Tresp · Jindong Gu 🔗 |
-
|
Scaling Laws for Downstream Task Performance of Large Language Models ( Poster ) > link | Berivan Isik · NATALIA PONOMAREVA · Hussein Hazimeh · Dimitris Paparas · Sergei Vassilvitskii · Sanmi Koyejo 🔗 |
-
|
Transformers' Spectral Bias and The Symmetric Group ( Poster ) > link | Itay Lavie · Guy Gur-Ari · Zohar Ringel 🔗 |
-
|
Asymmetry in Low-Rank Adapters of Foundation Models ( Poster ) > link | Jiacheng Zhu · Kristjan Greenewald · Kimia Nadjahi · Haitz Sáez de Ocáriz Borde · Rickard Gabrielsson · Leshem Choshen · Marzyeh Ghassemi · Mikhail Yurochkin · Justin Solomon 🔗 |
-
|
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT ( Poster ) > link | Jon Saad-Falcon · Dan Fu · Simran Arora · Neel Guha · Christopher Re 🔗 |
-
|
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts ( Poster ) > link | Maciej Pióro · Kamil Ciebiera · Krystian Król · Jan Ludziejewski · Michał Krutul · Jakub Krajewski · Szymon Antoniak · Piotr Miłoś · Marek Cygan · Sebastian Jaszczur 🔗 |
-
|
Concept-aware Data Construction Improves In-context Learning of Language Models ( Poster ) > link | Michal Štefánik · Marek Kadlčík · Petr Sojka 🔗 |
-
|
Uncovering Mesa-Optimization Algorithms in Transformers ( Poster ) > link |
13 presentersJohannes von Oswald · Eyvind Niklasson · Maximilian Schlegel · Alexander Meulemans · Seijin Kobayashi · Nicolas Zucchet · Nino Scherrer · Nolan Miller · Mark Sandler · Blaise Aguera y Arcas · Max Vladymyrov · Razvan Pascanu · Joao Sacramento |
-
|
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation ( Poster ) > link | Zhongyi Han · Guanglin Zhou · Rundong He · Jindong Wang · Tailin Wu · Yilong Yin · Salman Khan · Lina Yao · Tongliang Liu · Kun Zhang 🔗 |
-
|
Scaling Laws for Fine-Grained Mixture of Experts ( Poster ) > link |
12 presentersJan Ludziejewski · Jakub Krajewski · Kamil Adamczewski · Maciej Pióro · Michał Krutul · Szymon Antoniak · Kamil Ciebiera · Krystian Król · Tomasz Odrzygóźdź · Piotr Sankowski · Marek Cygan · Sebastian Jaszczur |
-
|
Is Mamba Capable of In-Context Learning? ( Poster ) > link | Riccardo Grazzi · Julien Siems · Simon Schrodi · Thomas Brox · Frank Hutter 🔗 |
-
|
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting ( Poster ) > link | Guande He · Peng Cui · Jianfei Chen · Wenbo Hu · Jun Zhu 🔗 |
-
|
Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Math Reasoning and Modular Arithmetic ( Poster ) > link | Jiuxiang Gu · Chenyang Li · Yingyu Liang · Zhenmei Shi · Zhao Song · Tianyi Zhou 🔗 |
-
|
Lessons from Identifiability for Understanding Large Language Models ( Poster ) > link | Patrik Reizinger · Szilvia Ujváry · Anna Mészáros · Anna Kerekes · Wieland Brendel · Ferenc Huszar 🔗 |
-
|
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation ( Poster ) > link | Seongyun Lee · Seungone Kim · Sue Park · Geewook Kim · Minjoon Seo 🔗 |
-
|
Provably Robust DPO: Aligning Language Models with Noisy Feedback ( Poster ) > link | Sayak Ray Chowdhury · Anush Kini · Nagarajan Natarajan 🔗 |
-
|
In-Context Data Distillation with TabPFN ( Poster ) > link | Junwei (Jeremy) Ma · Valentin Thomas · Guangwei Yu · Anthony Caterini 🔗 |
-
|
Transformers Can Achieve Length Generalization But Not Robustly ( Poster ) > link | Yongchao Zhou · Uri Alon · Xinyun Chen · Xuezhi Wang · Rishabh Agarwal · Denny Zhou 🔗 |
-
|
Scalable Ensembling For Mitigating Reward Overoptimisation ( Poster ) > link | Ahmed Ahmed · Rafael Rafailov · Stepan Sharkov · Xuechen Li · Sanmi Koyejo 🔗 |
-
|
tinyBenchmarks: evaluating LLMs with fewer examples ( Poster ) > link | Felipe Polo · Lucas Weber · Leshem Choshen · Yuekai Sun · Gongjun Xu · Mikhail Yurochkin 🔗 |
-
|
ShERPA: Leveraging Neuron Alignment for Knowledge-preserving Fine-tuning ( Poster ) > link | Dongkyu Cho · Jinseok Yang · Jun Seo · Seohui Bae · Dongwan Kang · Soyeon Park · Hyeokjun Choe · Woohyung Lim 🔗 |
-
|
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems ( Poster ) > link | David Hoffmann · Simon Schrodi · Jelena Bratulić · Nadine Behrmann · Volker Fischer · Thomas Brox 🔗 |
-
|
Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP ( Poster ) > link | Laura Niss · Kevin Vogt-Lowell · Theodoros Tsiligkaridis 🔗 |
-
|
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok ( Poster ) > link | Tikeng Notsawo Pascal Junior · Hattie Zhou · Mohammad Pezeshki · Irina Rish · Guillaume Dumas 🔗 |
-
|
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression ( Poster ) > link | Huiqiang Jiang · Qianhui Wu · Xufang Luo · Dongsheng Li · Chin-Yew Lin · Yuqing Yang · Lili Qiu 🔗 |
-
|
"I'm not Racist but…": Discovering Bias in the Internal Knowledge of Large Language Models ( Poster ) > link | Abel Salinas · Louis Penafiel · Robert McCormack · Fred Morstatter 🔗 |
-
|
LangBridge: Multilingual Reasoning Without Multilingual Supervision ( Poster ) > link | Dongkeun Yoon · Joel Jang · Sungdong Kim · Seungone Kim · SHEIKH SHAFAYAT · Minjoon Seo 🔗 |
-
|
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning ( Poster ) > link | Simon Schrodi · David Hoffmann · Max Argus · Volker Fischer · Thomas Brox 🔗 |
-
|
Unsupervised Domain Adaptation within Deep Foundation Latent Spaces ( Poster ) > link | Dmitry Kangin · Plamen Angelov 🔗 |
-
|
On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models ( Poster ) > link | Ken Liu · Zhoujie Ding · Berivan Isik · Sanmi Koyejo 🔗 |
-
|
Dual Operating Modes of In-Context Learning ( Poster ) > link | Ziqian Lin · Kangwook Lee 🔗 |
-
|
Dichotomy in Compositional Reasoning: Scaling and Limitations of LLMs in Composite Task ( Poster ) > link | Zhuoyan Xu · Zhenmei Shi · Yingyu Liang 🔗 |
-
|
Can Generative Multimodal Models Count to Ten? ( Poster ) > link | Sunayana Rane · Alexander Ku · Jason Baldridge · Ian Tenney · Thomas L. Griffiths · Been Kim 🔗 |
-
|
Orchid: Flexible and Data-Adaptive Convolution for Sequence Modeling ( Poster ) > link | Mahdi Karami · Ali Ghodsi 🔗 |
-
|
Attributing Mode Collapse in the fine-tuning of Large Language Models ( Poster ) > link | Laura O'Mahony · Leo Grinsztajn · Hailey Schoelkopf · Stella R Biderman 🔗 |
-
|
Massive Activations in Large Language Models ( Poster ) > link | Mingjie Sun · Xinlei Chen · J Kolter · Zhuang Liu 🔗 |
-
|
GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks ( Poster ) > link | Shivanshu Gupta · Clemens Rosenbaum · Ethan Elenberg 🔗 |
-
|
Shortened LLaMA: A Simple Depth Pruning for Large Language Models ( Poster ) > link | Bo-Kyeong Kim · Geonmin Kim · Tae-Ho Kim · Thibault Castells · Shinkook Choi · Junho Shin · Hyoung-Kyu Song 🔗 |
-
|
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-tuned LLMs ( Poster ) > link | Raghavv Goel · Mukul Gagrani · Wonseok Jeon · Junyoung Park · Mingu Lee · Christopher Lott 🔗 |
-
|
BlackMamba: Mixture of Experts for State-Space Models ( Poster ) > link | Quentin Anthony · Yury Tokpanov · Paolo Glorioso · Beren Millidge 🔗 |
-
|
Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models ( Poster ) > link | Zachary Ankner · Cody Blakeney · Kartik Sreenivasan · Max M Marion · Matthew Leavitt · Mansheej Paul 🔗 |
-
|
Prompting a Pretrained Transformer Can Be a Universal Approximator ( Oral ) > link | Aleksandar Petrov · Adel Bibi · Philip Torr 🔗 |
-
|
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint ( Oral ) > link | Wei Xiong · Hanze Dong · Chenlu Ye · Ziqi Wang · Han Zhong · Heng Ji · Nan Jiang · Tong Zhang 🔗 |
-
|
Selecting Large Language Model to Fine-tune via Rectified Scaling Law ( Oral ) > link | Haowei Lin · Baizhou Huang · Haotian Ye · Qinyu Chen · Zihao Wang · Sujian Li · Jianzhu Ma · Xiaojun Wan · James Y Zou · Yitao Liang 🔗 |
-
|
Uncovering Mesa-Optimization Algorithms in Transformers ( Oral ) > link |
13 presentersJohannes von Oswald · Eyvind Niklasson · Maximilian Schlegel · Alexander Meulemans · Seijin Kobayashi · Nicolas Zucchet · Nino Scherrer · Nolan Miller · Mark Sandler · Blaise Aguera y Arcas · Max Vladymyrov · Razvan Pascanu · Joao Sacramento |
-
|
Scaling Laws for Fine-Grained Mixture of Experts ( Oral ) > link |
12 presentersJan Ludziejewski · Jakub Krajewski · Kamil Adamczewski · Maciej Pióro · Michał Krutul · Szymon Antoniak · Kamil Ciebiera · Krystian Król · Tomasz Odrzygóźdź · Piotr Sankowski · Marek Cygan · Sebastian Jaszczur |
-
|
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning ( Oral ) > link | Simon Schrodi · David Hoffmann · Max Argus · Volker Fischer · Thomas Brox 🔗 |
-
|
Massive Activations in Large Language Models ( Oral ) > link | Mingjie Sun · Xinlei Chen · J Kolter · Zhuang Liu 🔗 |