Poster
in
Workshop: SCOPE: SCALABLE OPTIMIZATION FOR EFFICIENT AND ADPATIVE FOUNDATION MODELS

UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices

Seul-Ki Yeom · Tae-Ho Kim

Keywords: Attention Mechanism Memory Optimization Efficient Transformers

Project Page [ OpenReview]

Abstract

Transformer-based architectures excel across various domains but face challenges on edge devices due to high memory and computational demands. In this paper, we introduce a novel Reuse Attention mechanism, tailored for efficient memory access and computational optimization, enabling seamless operation on resource-constrained platforms without compromising performance. Comprehensive experiments on ImageNet-1K and downstream tasks show that UniForm, leveraging Reuse Attention, achieves competitive accuracy while significantly improving inference speed and memory efficiency. Notably, UniForm-l achieves 76.7% Top-1 accuracy on ImageNet-1K with 21.8ms inference time on Jetson AGX Orin, achieving up to a 5× speedup over competing benchmarks. These results highlight the broad applicability of Reuse Attention across GPUs and edge platforms, enabling real-time deployment in resource-constrained environments.

Video

Chat is not available.