Skip to yearly menu bar Skip to main content


Workshop

Navigating and Addressing Data Problems for Foundation Models (DPFM)

Ruoxi Jia · Tatsunori Hashimoto · Pang Wei Koh · Jerone Andrews · Sang Michael Xie · Lingjiao Chen · Myeongseob Ko · Feiyang Kang

Stolz 0

Sat 11 May, midnight PDT

Foundation Models (FMs, e.g., GPT-3/4, LLaMA, DALL-E, Stable Diffusion, etc.) have been achieving sweeping success on a wide range of tasks. As researchers strive to keep up with the understanding of the capabilities and limitations of FMs as well as their implications following the rapid evolution, the attention is now shifting to the emerging notion of data-centric AI. The curation of training data has been shown to be crucially important for the performance and reliability of FMs and a wealth of recent works demonstrate that data-perspective research sheds light on a promising direction toward critical issues such as safety, alignment, efficiency, security, privacy, interpretability, etc. Recent year has seen a spur of individual works exploring many frontiers related to this topic, providing now an excellent opportunity to bring together brilliant minds to search for a systematic framework and roadmap for research. This workshop aims to discuss and explore a better understanding of the new paradigm for research on data problems for foundation models. Our technical agenda is composed of four modules with 12 confirmed speakers:- A. Data Quality, Dataset Curation, and Data Generation–Recent Achievements and Current Efforts- B. A Data Perspective to Efficiency, Interpretability, and Alignment–Latest Advancement and Breakthroughs- C. A Data Perspective to Safety and Ethics–Risks, Limitations, and Opportunities- D. Copyright, Legal Issues, and Data Economy–A Broader LandscapeWe strive to build a community behind this essential topic. Noting that the current data practices of foundation models are largely opaque, one mission of this workshop is to create a community effort on open source data efforts at the pretraining stage itself. Subsequent efforts include creating datasets, benchmark, and dedicated venues to promote research on data problems for foundation models and ultimately facilitate the widespread deployment of FMs in a sociotechnical-friendly way that provides benefit at large. Examples of our target communities include researchers on data problems (e.g., data-centric AI, dataset/data curation, data market) and foundation models (alignment, safety/trustworthiness, fairness/ethics), practitioners of downstream applications, tech companies providing innovative solutions and beyond.

Chat is not available.
Timezone: America/Los_Angeles

Schedule