ICLR Poster From Decoupling to Adaptive Transformation: a Wider Optimization Space for PTQ

Poster

From Decoupling to Adaptive Transformation: a Wider Optimization Space for PTQ

Zhaojing Wen · Qiulin Zhang · Yuan Zhang · Rudan Chen · Xichao Yang · Di Xie · Jiang Zhu

Hall 3 + Hall 2B #356

[ Abstract ] [ Project Page ]

Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

Post-Training low-bit Quantization (PTQ) is useful to accelerate DNNs due to its high efficiency, the current SOTAs of which mostly adopt feature reconstruction with self-distillation finetuning. However, when bitwidth goes to be extremely low, we find the current reconstruction optimization space is not optimal. Considering all possible parameters and the ignored fact that integer weight can be obtained early before actual inference, we thoroughly explore different optimization space by quant-step decoupling, where a wider PTQ optimization space, which consistently makes a better optimum, is found out. Based on these, we propose an Adaptive Quantization Transformation (AdaQTransform) for PTQ reconstruction, which makes the quantized output feature better fit the FP32 counterpart with adaptive per-channel transformation, thus achieves lower feature reconstruction error. In addition, it incurs negligible extra finetuning cost and no extra inference cost. Based on AdaQTransform, for the first time, we build a general quantization setting paradigm subsuming current PTQs, QATs and other potential forms. Experiments demonstrate AdaQTransform expands the optimization space for PTQ and helps current PTQs find a better optimum over CNNs, ViTs, LLMs and image super-resolution networks, e.g., it improves NWQ by 5.7% on ImageNet for W2A2-MobileNet-v2.

Live content is unavailable. Log in and register to view live content