Towards Sampling Data Structures for Tensor Products in Turnstile Streams
Zhao Song ⋅ Shenghao Xie ⋅ Samson Zhou
Abstract
This paper studies the computational challenges of large-scale attention-based models in artificial intelligence by introducing innovative sampling methods in the streaming setting. Inspired by the classical definition of the $\ell_2$ sampler and the recent progress of the attention scheme in Large Language Models (LLMs), we propose the definition of the attention sampler. These attention samplers select the important coordinates in attention computation efficiently, bypassing the quadratic computational burden of computing the entire attention matrix. We demonstrate the effectiveness of the attention sampler from a theoretical perspective, including space and update time. Additionally, our framework exhibits scalability and broad applicability across various model architectures and domains.
Successful Page Load