Fri 11:50 p.m. - 12:00 a.m.
|
Opening Remarks
(
Intro
)
>
SlidesLive Video
|
🔗
|
Sat 12:00 a.m. - 12:30 a.m.
|
Invited Talk #1 - Bridging the Gap Between Pre-training Data and Alignment [Speaker: Mike Lewis (Meta AI)]
(
Invited Talk
)
>
SlidesLive Video
|
Mike Lewis
🔗
|
Sat 12:30 a.m. - 12:45 a.m.
|
Best Paper Oral Presentation #1 - Does Data Contamination Make a Difference? Insights from Intentionally Contaminating Pre-training Data For Language Models [Speaker: Ken Liu (Stanford University)]
(
Oral Presentation
)
>
link
SlidesLive Video
|
Ken Liu
🔗
|
Sat 12:45 a.m. - 1:00 a.m.
|
Best Paper Oral Presentation #2 - The Science of Data Filtering: Data Curation cannot be Compute Agnostic [Speakers: Sachin Goyal & Pratyush Maini (CMU)]
(
Oral Presentation
)
>
link
SlidesLive Video
|
Sachin Goyal · Pratyush Maini
🔗
|
Sat 1:00 a.m. - 2:00 a.m.
|
Poster Session I & Coffee Break (ALL posters)
(
Poster Session
)
>
|
🔗
|
Sat 2:00 a.m. - 2:30 a.m.
|
Invited Talk #2 - A data-centric view on reliable generalization: From ImageNet to LAION-5B & DataComp [Speaker: Ludwig Schmidt (Anthropic, Stanford, and U Washington)]
(
Invited Talk
)
>
SlidesLive Video
|
Ludwig Schmidt
🔗
|
Sat 2:30 a.m. - 2:45 a.m.
|
Best Paper Oral Presentation #3 - VideoCon: Robust Video-Language Alignment via Contrast Captions [Speaker: Hritik Bansal (UCLA)]
(
Oral Presentation
)
>
link
SlidesLive Video
|
Hritik Bansal
🔗
|
Sat 2:45 a.m. - 3:00 a.m.
|
Best Paper Oral Presentation #4 - What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety [Speaker: Luxi He (Princeton University)]
(
Oral Presentation
)
>
link
SlidesLive Video
|
Luxi He
🔗
|
Sat 3:00 a.m. - 4:00 a.m.
|
Lunch Break
(
Lunchtime
)
>
|
🔗
|
Sat 4:00 a.m. - 4:30 a.m.
|
Invited Talk #3 - Making “GPT-Next” Trustworthy Through Data [Speaker: Eric Wallace (OpenAI)]
(
Invited Talk
)
>
SlidesLive Video
|
Eric Wallace
🔗
|
Sat 4:30 a.m. - 4:45 a.m.
|
Best Paper Oral Presentation #5 - Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis [Speaker: Lukas Struppek (TU Darmstadt)]
(
Oral Presentation
)
>
link
SlidesLive Video
|
Lukas Struppek
🔗
|
Sat 4:45 a.m. - 5:00 a.m.
|
Best Paper Oral Presentation #6 - Computational Copyright: Towards A Royalty Model for AI Music Generation Platforms [Speaker: Jiaqi Ma (UIUC)]
(
Oral Presentation
)
>
link
SlidesLive Video
|
Jiaqi Ma
🔗
|
Sat 5:00 a.m. - 6:00 a.m.
|
Poster Session II & Coffee Break (ALL posters)
(
Poster Session
)
>
|
🔗
|
Sat 6:00 a.m. - 6:30 a.m.
|
Invited Talk #4 - Characterizing Machine Unlearning through Definitions and Implementations [Speaker: Nicolas Papernot (University of Toronto & Vector Institute)]
(
Invited Talk
)
>
SlidesLive Video
|
Nicolas Papernot
🔗
|
Sat 6:30 a.m. - 7:00 a.m.
|
Invited Talk #5 - Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models [Speaker: Luke Zettlemoyer (U Washington/Meta)]
(
Invited Talk
)
>
SlidesLive Video
|
Luke Zettlemoyer
🔗
|
Sat 7:00 a.m. - 7:30 a.m.
|
Interactive Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video
|
🔗
|
Sat 7:30 a.m. - 7:35 a.m.
|
Closing Remarks
(
Remarks
)
>
|
🔗
|
-
|
Label-free Neural Semantic Image Synthesis
(
Poster
)
>
link
|
Jiayi Wang · Kevin Laube · Yumeng Li · Jan Hendrik Metzen · Shin-I Cheng · Julio Borges · Anna Khoreva
🔗
|
-
|
What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety
(
Poster
)
>
link
|
Luxi He · Mengzhou Xia · Peter Henderson
🔗
|
-
|
Does Data Contamination Make a Difference? Insights from Intentionally Contaminating Pre-training Data For Language Models
(
Poster
)
>
link
|
Minhao Jiang · Ken Liu · Ming Zhong · Rylan Schaeffer · Siru Ouyang · Jiawei Han · Sanmi Koyejo
🔗
|
-
|
Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models
(
Poster
)
>
link
|
Zachary Ankner · Cody Blakeney · Kartik Sreenivasan · Max M Marion · Matthew Leavitt · Mansheej Paul
🔗
|
-
|
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models
(
Poster
)
>
link
|
Yuancheng Xu · Jiarui Yao · Manli Shu · Yanchao Sun · Zichu Wu · Ning Yu · Tom Goldstein · Furong Huang
🔗
|
-
|
[***Online Presentation***] Distributional Dataset Distillation with Subtask Decomposition
(
Poster
)
>
link
|
Tian Qin · Zhiwei Deng · David Alvarez-Melis
🔗
|
-
|
Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates
(
Poster
)
>
link
|
Avanika Narayan · Mayee Chen · Kush Bhatia · Christopher Re
🔗
|
-
|
Evaluating Large Language Models in an Emerging Domain: A Pilot Study in Decentralized Finance
(
Poster
)
>
link
|
Joshua Pearlson · Xiaoyuan Liu · Chengsong Huang · Kripa George · Dawn Song · Chenguang Wang
🔗
|
-
|
QuRating: Selecting High-Quality Data for Training Lanugage Models
(
Poster
)
>
link
|
Alexander Wettig · Aatmik Gupta · Saumya Malik · Danqi Chen
🔗
|
-
|
Toward Data-driven Skill Identification for General-purpose Vision-language Models
(
Poster
)
>
link
|
Anthony Tiong · Junqi Zhao · Junnan Li · Steven Hoi · Caiming Xiong · Boyang Albert Li
🔗
|
-
|
TOFU: A Task of Fictitious Unlearning for LLMs
(
Poster
)
>
link
|
Pratyush Maini · Zhili Feng · Avi Schwarzschild · Zachary Lipton · J Kolter
🔗
|
-
|
Incentivizing Inclusive Data Contributions in Personalized Federated Learning
(
Poster
)
>
link
|
Enpei Zhang · Jingyi Chai · Rui Ye · Yanfeng Wang · Siheng Chen
🔗
|
-
|
How to Craft Backdoors with Unlabeled Data Alone?
(
Poster
)
>
link
|
Yifei Wang · Wenhan Ma · Stefanie Jegelka · Yisen Wang
🔗
|
-
|
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
(
Poster
)
>
link
|
Yuchen Li · Alexandre Kirchmeyer · Aashay Mehta · Yilong Qin · Boris Dadachev · Kishore Papineni · Sanjiv Kumar · Andrej Risteski
🔗
|
-
|
Feedback-guided Data Synthesis for Imbalanced Classification
(
Poster
)
>
link
|
Reyhane Askari Hemmat · Mohammad Pezeshki · Florian Bordes · Michal Drozdzal · Adriana Romero-Soriano
🔗
|
-
|
Efficient Global Data Attribution for Diffusion Models
(
Poster
)
>
link
|
MingYu Lu · Chris Lin · Su-In Lee
🔗
|
-
|
Scalable Data Extraction from Retrieval-Augmented Generation Systems
(
Poster
)
>
link
|
Zhenting Qi · Hanlin Zhang · Eric Xing · Sham Kakade · Hima Lakkaraju
🔗
|
-
|
Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases
(
Poster
)
>
link
|
Elad Levi · Eli Brosh · Matan Friedmann
🔗
|
-
|
A Tale of Tails: Model Collapse as a Change of Scaling Laws
(
Poster
)
>
link
|
Yunzhen Feng · Elvis Dohmatob · Pu Yang · François Charton · Julia Kempe
🔗
|
-
|
AdaDemo: Adaptive Online Demonstration Expansion for Multi-task Visual Policy Learning
(
Poster
)
>
link
|
Tongzhou Mu · Yijie Guo · Jie Xu · Ankit Goyal · Hao Su · Dieter Fox · Animesh Garg
🔗
|
-
|
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
(
Poster
)
>
link
|
Yupan Huang · Zaiqiao Meng · Fangyu Liu · Yixuan Su · Nigel Collier · Yutong Lu
🔗
|
-
|
Autonomous Data Selection with Language Models for Mathematical Texts
(
Poster
)
>
link
|
Yifan Zhang · Yifan Luo · Yang Yuan · Andrew Yao
🔗
|
-
|
Enhancing Data Quality in Federated Fine-Tuning of Foundation Models
(
Poster
)
>
link
|
Wanru Zhao · Yaxin Du · Nic Lane · Siheng Chen · Yanfeng Wang
🔗
|
-
|
Multimodal Dataset Upgrading: a New Challenge for Data Annotation
(
Poster
)
>
link
|
Haiwen Huang · Dan Zhang · Andreas Geiger
🔗
|
-
|
ON THE SCALABILITY OF GNNS FOR MOLECULAR GRAPHS
(
Poster
)
>
link
|
Maciej Sypetkowski · Frederik Wenkel · Farimah Poursafaei · Nia Dickson · Karush Suri · Philip Fradkin · Dominique Beaini
🔗
|
-
|
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
(
Poster
)
>
link
|
Lukas Struppek · Dominik Hintersdorf · Felix Friedrich · Manuel Brack · Patrick Schramowski · Kristian Kersting
🔗
|
-
|
Improving Practical Counterfactual Fairness with Limited Causal Knowledge
(
Poster
)
>
link
|
Zeyu Zhou · Ruqi Bai · David Inouye
🔗
|
-
|
Vision-Language Dataset Distillation
(
Poster
)
>
link
|
Xindi Wu · Byron Zhang · Zhiwei Deng · Olga Russakovsky
🔗
|
-
|
Data Alignment for Zero-Shot Concept Generation in Dermatology AI
(
Poster
)
>
link
|
Soham Gadgil · Mahtab Bigverdi
🔗
|
-
|
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing
(
Poster
)
>
|
Jiamu Zheng · Jinghuai Zhang · Futing Wang · Tianyu Du · Tao Lin
🔗
|
-
|
Scaling Laws for Downstream Task Performance of Large Language Models
(
Poster
)
>
link
|
Berivan Isik · NATALIA PONOMAREVA · Hussein Hazimeh · Dimitris Paparas · Sergei Vassilvitskii · Sanmi Koyejo
🔗
|
-
|
Hallucination Augmented Recitations for Language Models
(
Poster
)
>
link
|
Abdullatif Köksal · Renat Aksitov · Chung-Ching Chang
🔗
|
-
|
LongForm: Effective Instruction Tuning with Reverse Instructions
(
Poster
)
>
link
|
Abdullatif Köksal · Timo Schick · Anna Korhonen · Hinrich Schuetze
🔗
|
-
|
Model & Data Insights using Pre-trained Language Models
(
Poster
)
>
link
|
Saeid Asgari · Aliasghar Khani · Amir Khasahmadi · Aditya Sanghi · Karl Willis · Ali Mahdavi Amiri
🔗
|
-
|
LESS: Selecting Influential Data for Targeted Instruction Tuning
(
Poster
)
>
link
|
Mengzhou Xia · Sadhika Malladi · Suchin Gururangan · Sanjeev Arora · Danqi Chen
🔗
|
-
|
Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL
(
Poster
)
>
link
|
Yongjin Yang · Sihyeon Kim · SangMook Kim · Gyubok Lee · Se-Young Yun · Edward Choi
🔗
|
-
|
Virtual Classifier: A Reversed Approach for Robust Image Evaluation
(
Poster
)
>
link
|
Jizhe Zhang · Yifei Wang · Yisen Wang
🔗
|
-
|
[***Online Presentation***] DELE: Data Efficient LLM Evaluation
(
Poster
)
>
link
|
Gayathri Saranathan · Mahammad Parwez Alam · JAMES LIM · Suparna Bhattacharya · Soon Wong · Martin Foltin · Cong Xu
🔗
|
-
|
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
(
Poster
)
>
link
|
Alex Gu · Baptiste Roziere · Hugh Leather · Armando Solar-Lezama · Gabriel Synnaeve · Sida Wang
🔗
|
-
|
Computational Copyright: Towards A Royalty Model for AI Music Generation Platforms
(
Poster
)
>
link
|
Junwei Deng · Jiaqi Ma
🔗
|
-
|
Prompt Optimization with Logged Bandit Data
(
Poster
)
>
link
|
Haruka Kiyohara · Yuta Saito · Daniel Cao · Thorsten Joachims
🔗
|
-
|
The Science of Data Filtering: Data Curation cannot be Compute Agnostic
(
Poster
)
>
link
|
Sachin Goyal · Pratyush Maini · Zachary Lipton · Aditi Raghunathan · J Kolter
🔗
|
-
|
West-of-N: Synthetic Preference Generation for Improved Reward Modeling
(
Poster
)
>
link
|
Alizée Pace · Jonathan Mallinson · Eric Malmi · Sebastian Krause · Aliaksei Severyn
🔗
|
-
|
Data Debiasing via Model-free Data Pruning
(
Poster
)
>
link
|
Lei Hsiung · Yaoqing Yang
🔗
|
-
|
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
(
Poster
)
>
link
|
Florian Eddie Dorner · Moritz Hardt
🔗
|
-
|
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
(
Poster
)
>
link
|
Pratyush Maini · Skyler Seto · He Bai · David Grangier · Yizhe Zhang · Navdeep Jaitly
🔗
|
-
|
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
(
Poster
)
>
link
|
Avi Singh · John Co-Reyes · Rishabh Agarwal
🔗
|
-
|
Pre-training Concept Frequency is predictive of CLIP Zero-shot Performance
(
Poster
)
>
link
|
Vishaal Udandarao · Ameya Prabhu · Philip Torr · Adel Bibi · Samuel Albanie · Matthias Bethge
🔗
|
-
|
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
(
Poster
)
>
link
|
Hritik Bansal · John Dang · Aditya Grover
🔗
|
-
|
VideoCon: Robust Video-Language Alignment via Contrast Captions
(
Poster
)
>
link
|
Hritik Bansal · Yonatan Bitton · Idan Szpektor · Kai-Wei Chang · Aditya Grover
🔗
|
-
|
Augmenting Math Word Problems via Iterative Question Composing
(
Poster
)
>
link
|
Haoxiong Liu · Yifan Zhang · Yifan Luo · Andrew Yao
🔗
|
-
|
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
(
Poster
)
>
link
|
Rui Ye · WenHao Wang · Jingyi Chai · Dihan Li · Zexi Li · Yinda Xu · Yaxin Du · Yanfeng Wang · Siheng Chen
🔗
|