Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)

Workshop

Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)

Ananya Kumar · Tengyu Ma · Tiffany Vlaar · Aditi Raghunathan · Hanie Sedghi · Yamini Bansal · Sang Michael Xie · Percy Liang · Mathilde Caron

AD10

Thu 4 May, 12:15 a.m. PDT

[ Abstract ] Workshop Website

Foundation models (FMs) are models that are trained on a large and diverse pool of data and can be adapted to a wide range of tasks. Recent examples of FMs include large language models (GPT-3, BERT, PaLM), image representation encoders (SimCLR), and image-text models (CLIP, DALL-E), which have all revolutionized the way models are built in their domains. Foundation models are poorly understood: the core driving principle behind Foundation Models (FMs) is transfer learning, but scale and modern self supervision techniques have led to emergent capabilities we might not have anticipated. The goal of this workshop is to highlight research that aims to improve our understanding of FMs. We liberally interpret understanding as any research ranging from purely empirical papers that highlight interesting phenomena, to those which attempt to explain or provide theoretical foundations for such phenomena in potentially simplified settings.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Thu 12:15 a.m. - 12:45 a.m.	Invited Talk (Yann Dauphin): Leveraging Multiple Models and Multiple Tasks ( Invited Talk ) > SlidesLive Video	Yann Dauphin 🔗
Thu 12:45 a.m. - 12:50 a.m.	Q&A ( Q&A ) >	🔗
Thu 12:50 a.m. - 1:20 a.m.	Invited Talk (Jared Kaplan): AI Safety, RLHF, and Self-Supervision ( Invited Talk ) > SlidesLive Video	Jared Kaplan 🔗
Thu 1:20 a.m. - 1:25 a.m.	Q&A ( Q&A ) >	🔗
Thu 1:25 a.m. - 1:35 a.m.	Coffee Break	🔗
Thu 1:35 a.m. - 2:05 a.m.	Invited Talk (Lenka Zdeborová): Insights from exactly solvable high-dimensional models ( Invited Talk ) > SlidesLive Video	Lenka Zdeborova 🔗
Thu 2:05 a.m. - 2:10 a.m.	Q&A ( Q&A ) >	🔗
Thu 2:10 a.m. - 2:40 a.m.	Invited Talk (Sanjeev Arora): Task-specific Skill Localization in Fine-tuned Language Models ( Invited Talk ) > SlidesLive Video	Sanjeev Arora 🔗
Thu 2:40 a.m. - 2:45 a.m.	Q&A ( Q&A ) >	🔗
Thu 4:00 a.m. - 5:00 a.m.	Accelerating Neural Self-Improvement via Bootstrapping ( Poster ) > link Link	Kazuki Irie · Jürgen Schmidhuber 🔗
Thu 4:00 a.m. - 5:00 a.m.	Mini-Batch Optimization of Contrastive Loss ( Poster ) > link Link	Kartik Sreenivasan · Keon Lee · Jeong-Gwan Lee · Anna Lee · Jaewoong Cho · Jy-yong Sohn · Dimitris Papailiopoulos · Kangwook Lee 🔗
Thu 4:00 a.m. - 5:00 a.m.	On the Role of Attention in Prompt-tuning ( Poster ) > link Link	Samet Oymak · Ankit Singh Rawat · Mahdi Soltanolkotabi · Christos Thrampoulidis 🔗
Thu 4:00 a.m. - 5:00 a.m.	LOOPED TRANSFORMERS AS PROGRAMMABLE COMPUTERS ( Poster ) > link Link	Angeliki Giannou · Shashank Rajput · Jy-yong Sohn · Kangwook Lee · Jason Lee · Dimitris Papailiopoulos 🔗
Thu 4:00 a.m. - 5:00 a.m.	Diffusion Models are Minimax Optimal Distribution Estimators ( Poster ) > link Link	Kazusato Oko · Akiyama Shunta · Taiji Suzuki 🔗
Thu 4:00 a.m. - 5:00 a.m.	The Effects of Pretraining Task Diversity on In-Context Learning of Ridge Regression ( Poster ) > link Link	Allan Raventos · Mansheej Paul · Feng Chen · Surya Ganguli 🔗
Thu 4:00 a.m. - 5:00 a.m.	Conservative Prediction via Transductive Confidence Minimization ( Poster ) > link Link	Caroline Choi · Fahim Tajwar · Yoonho Lee · Huaxiu Yao · Ananya Kumar · Chelsea Finn 🔗
Thu 4:00 a.m. - 5:00 a.m.	Controlled assessment of CLIP-style language-aligned vision models in prediction of brain & behavioral data ( Poster ) > link Link	Colin Conwell · Jacob Prince · Christopher Hamblin · George Alvarez 🔗
Thu 4:00 a.m. - 5:00 a.m.	The Independent Compositional Subspace Hypothesis for the Structure of CLIP's Last Layer ( Poster ) > link Link	Max Wolff · Wieland Brendel · Stuart Wolff 🔗
Thu 4:00 a.m. - 5:00 a.m.	Exploring Demonstration Ensembling for In-context Learning ( Poster ) > link Link	Muhammad Khalifa · Lajanugen Logeswaran · Moontae Lee · Honglak Lee · Lu Wang 🔗
Thu 4:00 a.m. - 5:00 a.m.	A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks ( Poster ) > link Link	William Merrill · Nikolaos Tsilivis · Aman Shukla 🔗
Thu 4:00 a.m. - 5:00 a.m.	A Comprehensive Benchmark of Human-Like Relational Reasoning for Text-to-Image Foundation Models ( Poster ) > link Link	Colin Conwell · Tomer Ullman 🔗
Thu 4:00 a.m. - 5:00 a.m.	Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations ( Poster ) > link Link	Shashank Shekhar · Florian Bordes · Pascal Vincent · Ari Morcos 🔗
Thu 4:00 a.m. - 5:00 a.m.	Robustness of edited neural networks ( Poster ) > link Link	Davis Brown · Charles Godfrey · Cody Nizinski · Jonathan Tu · Henry Kvinge 🔗
Thu 4:00 a.m. - 5:00 a.m.	Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension ( Poster ) > link Link	Henry Kvinge · Davis Brown · Charles Godfrey 🔗
Thu 4:00 a.m. - 5:00 a.m.	Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters ( Poster ) > link Link	Boshi Wang · Sewon Min · Xiang Deng · Jiaming Shen · You Wu · Luke Zettlemoyer · Huan Sun 🔗
Thu 4:00 a.m. - 5:00 a.m.	SemDeDup: Data-efficient learning at web-scale through semantic deduplication ( Poster ) > link Link	Amro Kamal · Kushal Tirumala · Daniel Simig · Surya Ganguli · Ari Morcos 🔗
Thu 4:00 a.m. - 5:00 a.m.	Effective Data Augmentation With Diffusion Models ( Poster ) > link Link	Brandon Trabucco · Kyle Doherty · Max Gurinas · Ruslan Salakhutdinov 🔗
Thu 4:00 a.m. - 5:00 a.m.	Text-to-Image Diffusion Models are Zero-Shot Classifiers ( Poster ) > link Link	Kevin Clark · Priyank Jaini 🔗
Thu 4:00 a.m. - 5:00 a.m.	Simple Hardware-Efficient Long Convolutions for Sequence Modeling ( Poster ) > link Link	Dan Fu · Elliot Epstein · Eric Nguyen · Armin Thomas · Michael Zhang · Tri Dao · Atri Rudra · Christopher Re 🔗
Thu 4:00 a.m. - 5:00 a.m.	Understanding HTML with Large Language Models ( Poster ) > link Link	Izzeddin Gur · Ofir Nachum · Yingjie Miao · Mustafa Safdari · Austin Huang · Aakanksha Chowdhery · SHARAN NARANG · Noah Fiedel · Aleksandra Faust 🔗
Thu 4:00 a.m. - 5:00 a.m.	Instruction-Finetuned Foundation Models for Multimodal Web Navigation ( Poster ) > link Link	Hiroki Furuta · Ofir Nachum · Kuang-Huei Lee · Yutaka Matsuo · Shixiang Gu · Izzeddin Gur 🔗
Thu 4:00 a.m. - 5:00 a.m.	Out-of-context Meta-learning in Large Language Models ( Poster ) > link Link	Dmitrii Krasheninnikov · Egor Krasheninnikov · David Krueger 🔗
Thu 4:00 a.m. - 5:00 a.m.	What Happens to the Source Domain in Transfer Learning? ( Poster ) > link Link	Amal Alnouri · Bilal Alsallakh 🔗
Thu 4:00 a.m. - 5:00 a.m.	Modality-Aware Adaptation of Contrastive Language-Image Models ( Poster ) > link Link	Alexander Long · Thalaiyasingam Ajanthan · Anton Hengel 🔗
Thu 4:00 a.m. - 5:00 a.m.	TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns ( Poster ) > link Link	Soma Onishi · Kenta Oono · Kohei Hayashi 🔗
Thu 4:00 a.m. - 5:00 a.m.	Do Video-Language Foundation Models have a Sense of Time? ( Poster ) > link Link	Piyush Nitin Bagad · Makarand Tapaswi · Cees G Snoek 🔗
Thu 4:00 a.m. - 5:00 a.m.	What Contrastive Learning Learns Beyond Class-wise Features? ( Poster ) > link Link	Xingyuming Liu · Yifei Wang · Yisen Wang 🔗
Thu 4:00 a.m. - 5:00 a.m.	Look Globally and Locally: Inter-Intra Contrastive Learning from Unlabeled Videos ( Poster ) > link Link	David Fan · Deyu Yang · Xinyu Li · Vimal Bhat · Rohith MV 🔗
Thu 4:00 a.m. - 5:00 a.m.	Improving Foundation Models for Few-Shot Learning via Multitask Finetuning ( Poster ) > link Link	Zhuoyan Xu · Zhenmei Shi · Junyi Wei · Yin Li · Yingyu Liang 🔗
Thu 4:00 a.m. - 5:00 a.m.	A Kernel-Based View of Language Model Fine-Tuning ( Poster ) > link Link	Sadhika Malladi · Alexander Wettig · Dingli Yu · Danqi Chen · Sanjeev Arora 🔗
Thu 4:00 a.m. - 5:00 a.m.	Variable Discretization for Self-Supervised Learning ( Poster ) > link Link	Chuang Niu · Wenjun Xia · Ge Wang 🔗
Thu 4:00 a.m. - 5:00 a.m.	Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs ( Poster ) > link Link	George Pu · Anirudh Jain · Jihan Yin · Russell Kaplan 🔗
Thu 4:00 a.m. - 5:00 a.m.	AWE: Adaptive weight-space ensembling for few-shot fine-tuning ( Poster ) > link Link	Jean-Christophe Gagnon-Audet · David J Schwab · Ricardo Monti 🔗
Thu 4:00 a.m. - 5:00 a.m.	Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations ( Poster ) > link Link	Xinxi Lyu · Sewon Min · Iz Beltagy · Luke Zettlemoyer · Hannaneh Hajishirzi 🔗
Thu 4:00 a.m. - 5:00 a.m.	Variational prompt tuning improves generalization of vision-language foundation models ( Poster ) > link Link	Mohammad Mahdi Derakhshani · Enrique Sanchez · Adrian Bulat · Victor Guilherme Turrisi da Costa · Cees G Snoek · Georgios Tzimiropoulos · Brais Martinez 🔗
Thu 4:00 a.m. - 5:00 a.m.	Aligning Foundation Models for Language with Preferences through $f$ -divergence Minimization ( Poster ) > link Link	Dongyoung Go · Tomek Korbak · Germàn Kruszewski · Jos Rozen · Nahyeon Ryu · Marc Dymetman 🔗
Thu 4:00 a.m. - 5:00 a.m.	The SSL Interplay: Augmentations, Inductive Bias, and Generalization ( Poster ) > link Link	Vivien Cabannes · Bobak Kiani · Randall Balestriero · Yann LeCun · Alberto Bietti 🔗
Thu 4:00 a.m. - 5:00 a.m.	Retrieval of Soft Prompt Enhances Zero-Shot Task Generalization ( Poster ) > link Link	Seonghyeon Ye · Joel Jang · Doyoung Kim · Yongrae Jo · Minjoon Seo 🔗
Thu 4:00 a.m. - 5:00 a.m.	Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners ( Poster ) > link Link	Seonghyeon Ye · Doyoung Kim · Joel Jang · Joongbo Shin · Minjoon Seo 🔗
Thu 4:00 a.m. - 5:00 a.m.	Project with Source, Probe with Target: Extracting Useful Features for Adaptation to Distribution Shifts ( Poster ) > link Link	Annie Chen · Yoonho Lee · Amrith Setlur · Sergey Levine · Chelsea Finn 🔗
Thu 4:00 a.m. - 5:00 a.m.	Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers ( Poster ) > link Link	Damai Dai · Yutao Sun · Li Dong · Yaru Hao · Shuming Ma · Zhifang Sui · Furu Wei 🔗
Thu 4:00 a.m. - 5:00 a.m.	Towards Foundation Models with Mathematical Understanding ( Poster ) > link Link	Peter Belcak · Roger Wattenhofer 🔗
Thu 4:00 a.m. - 5:00 a.m.	Principled Reinforcement Learning with Human Feedback from Pairwise or $K$ -wise Comparisons ( Poster ) > link Link	Banghua Zhu · Jiantao Jiao · Michael Jordan 🔗
Thu 4:00 a.m. - 5:00 a.m.	Broken Neural Scaling Laws ( Poster ) > link Link	Ethan Caballero · Kshitij Gupta · Irina Rish · David Krueger 🔗
Thu 4:00 a.m. - 5:00 a.m.	Coordinating Multiple Vision-Language Models for Visual Reasoning ( Poster ) > link Link	Liangyu Chen · Bo Li · Sheng Shen · Jingkang Yang · Chunyuan Li · Kurt Keutzer · trevor darrell · Ziwei Liu 🔗
Thu 5:00 a.m. - 5:05 a.m.	Diffusion Models are Minimax Optimal Distribution Estimators ( Spotlight ) > link SlidesLive Video Link	Kazusato Oko · Akiyama Shunta · Taiji Suzuki 🔗
Thu 5:08 a.m. - 5:13 a.m.	Text-to-Image Diffusion Models are Zero-Shot Classifiers ( Spotlight ) > link SlidesLive Video Link	Kevin Clark · Priyank Jaini 🔗
Thu 5:16 a.m. - 5:21 a.m.	Exploring Demonstration Ensembling for In-context Learning ( Spotlight ) > link SlidesLive Video Link	Muhammad Khalifa · Lajanugen Logeswaran · Moontae Lee · Honglak Lee · Lu Wang 🔗
Thu 5:24 a.m. - 5:29 a.m.	Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations ( Spotlight ) > link SlidesLive Video Link	Shashank Shekhar · Florian Bordes · Pascal Vincent · Ari Morcos 🔗
Thu 5:32 a.m. - 5:37 a.m.	Effective Data Augmentation With Diffusion Models ( Spotlight ) > link SlidesLive Video Link	Brandon Trabucco · Kyle Doherty · Max Gurinas · Ruslan Salakhutdinov 🔗
Thu 5:40 a.m. - 5:45 a.m.	Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners ( Spotlight ) > link SlidesLive Video Link	Seonghyeon Ye · Doyoung Kim · Joel Jang · Joongbo Shin · Minjoon Seo 🔗
Thu 5:48 a.m. - 5:53 a.m.	A Kernel-Based View of Language Model Fine-Tuning ( Spotlight ) > link SlidesLive Video Link	Sadhika Malladi · Alexander Wettig · Dingli Yu · Danqi Chen · Sanjeev Arora 🔗
Thu 6:00 a.m. - 6:30 a.m.	Invited Talk (Yasaman Bahri): Understanding Neural Scaling Laws ( Invited Talk ) > SlidesLive Video	Yasaman Bahri 🔗
Thu 6:30 a.m. - 6:35 a.m.	Q&A ( Q&A ) >	🔗
Thu 6:35 a.m. - 7:05 a.m.	Invited Talk (Danqi Chen): Analyzing Training Objectives and Trajectories in Language Pre-training ( Invited Talk ) > SlidesLive Video	Danqi Chen 🔗
Thu 7:05 a.m. - 7:10 a.m.	Q&A ( Q&A ) >	🔗
Thu 7:10 a.m. - 7:40 a.m.	Invited Talk (Jonathan Frankle): Faster Neural Network Training, Algorithmically ( Invited Talk ) >	Jonathan Frankle 🔗
Thu 7:40 a.m. - 7:45 a.m.	Q&A ( Q&A ) >	🔗