Skip to yearly menu bar Skip to main content


Workshop

SCOPE: SCALABLE OPTIMIZATION FOR EFFICIENT AND ADPATIVE FOUNDATION MODELS

Souvik Kundu · Tianlong Chen · Shiwei Liu · Haizhong Zheng · Amir Yazdanbakhsh · Beidi Chen · Yingyan Celine Lin

Peridot 204-205

Sun 27 Apr, 5:30 p.m. PDT

In the rapidly evolving landscape of AI, the development of scalable optimization methods to yield efficient and adaptive foundation models has significant demand in the space of their inference service. In specific, enabling model efficiency while allowing them to be adaptable to various new down-stream tasks has multifold challenges. Firstly, the model’s ability to quickly learn adaptive and efficient sub-model selection on different tasks requires the capability to perform continual weight updates, compute- and memory-efficient fine-tuning, and personalized adaptation. Secondly, with the increased demand for long context understanding and reasoning, the model needs to yieldsuch efficient adaptation with the informative usefulness of the query-specific token fetching. For instance, imagine a model that continually learns from current news events, adapting to the everchanging global landscape by integrating up-to-date knowledge. Such models may not only need efficient fine-tuning to new incoming data stream, but also understand efficient handling of the KV cache that may keep on growing with the requirement to handle longer contextual information. Additionally, the integration of retrieval-augmented generation (RAG) into foundation models can ensure that generated content is not only relevant, but also reflects the most current knowledge while costing the prefill size to go up. Thirdly, with such growing demand for contextual adaptation, mixture of experts (MoE) models have also received significant traction that can perform test time adaptation via learned routing policy. In addition, the emergence of sub-quadratic models with constant KV states as opposed to KV caching of transformers, has opened up a new avenue of the model’s adaptation ability in the context of information retention into compressive KV states. These capabilities rely on techniques for adapting foundation models, including fine-tuning, conversion, distillation, and in-context/few-shot learning. This workshop aims to capture advances in scalable, adaptive fine-tuning, calibration, and conversion to yield inference efficient quadratic and sub-quadratic foundation models, focusing on methodologies across vision, language, and multi-modal domains.

Live content is unavailable. Log in and register to view live content

Timezone: America/Los_Angeles

Schedule

Log in and register to view live content