WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic

Renkun Ni · Hong-Min Chu · Oscar Castaneda · Ping-yeh Chiang · Christoph Studer · Tom Goldstein

Keywords: [ efficient inference ] [ quantization ]


Low-precision neural networks represent both weights and activations with few bits, drastically reducing the cost of multiplications. Meanwhile, these products are accumulated using high-precision (typically 32-bit) additions. Additions dominate the arithmetic complexity of inference in quantized (e.g., binary) nets, and high precision is needed to avoid overflow. To further optimize inference, we propose WrapNet, an architecture that adapts neural networks to use low-precision (8-bit) additions while achieving classification accuracy comparable to their 32-bit counterparts. We achieve resilience to low-precision accumulation by inserting a cyclic activation layer that makes results invariant to overflow. We demonstrate the efficacy of our approach using both software and hardware platforms.

Chat is not available.