Skip to yearly menu bar Skip to main content


Poster

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Zhenheng Tang · Xiang Liu · Qian Wang · Peijie Dong · Bingsheng He · Xiaowen Chu · Bo Li

Hall 3 + Hall 2B #542
[ ]
Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention of researchers. However, Current methodologies predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy, on tasks involving common sense knowledge question answering and basic arithmetic reasoning. In this blog, we present a brief review of the recent advancements of LLM related to retrieval augmented generation, multi-step reasoning, external tools and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance with the original LLM with the assistances of multi-step reasoning and external tools. Based on the review of current progresses of LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.

Live content is unavailable. Log in and register to view live content