Poster
in
Workshop: SCOPE: SCALABLE OPTIMIZATION FOR EFFICIENT AND ADPATIVE FOUNDATION MODELS
UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices
Seul-Ki Yeom · Tae-Ho Kim
Keywords: [ Attention Mechanism ] [ Memory Optimization ] [ Efficient Transformers ]
Transformer-based architectures excel across various domains but face challenges on edge devices due to high memory and computational demands. In this paper, we introduce a novel Reuse Attention mechanism, tailored for efficient memory access and computational optimization, enabling seamless operation on resource-constrained platforms without compromising performance. Comprehensive experiments on ImageNet-1K and downstream tasks show that UniForm, leveraging Reuse Attention, achieves competitive accuracy while significantly improving inference speed and memory efficiency. Notably, UniForm-l achieves 76.7% Top-1 accuracy on ImageNet-1K with 21.8ms inference time on Jetson AGX Orin, achieving up to a 5× speedup over competing benchmarks. These results highlight the broad applicability of Reuse Attention across GPUs and edge platforms, enabling real-time deployment in resource-constrained environments.