Pretrained Language Model in Continual Learning: A Comparative Study

Tongtong Wu · Massimo Caccia · Zhuang Li · Yuan-Fang Li · Guilin Qi · Gholamreza Haffari

Keywords: [ continual learning ]

[ Abstract ]
[ Visit Poster at Spot B3 in Virtual World ] [ OpenReview
Wed 27 Apr 6:30 p.m. PDT — 8:30 p.m. PDT


Continual learning (CL) is a setting in which a model learns from a stream of incoming data while avoiding to forget previously learned knowledge. Pre-trained language models (PLMs) have been successfully employed in continual learning of different natural language problems. With the rapid development of many continual learning methods and PLMs, understanding and disentangling their interactions become essential for continued improvement of continual learning performance. In this paper, we thoroughly compare the continual learning performance over the combination of 5 PLMs and 4 CL approaches on 3 benchmarks in 2 typical incremental settings. Our extensive experimental analyses reveal interesting performance differences across PLMs and across CL methods. Furthermore, our representativeness probing analyses dissect PLMs’ performance characteristics in a layer-wise and task-wise manner, uncovering the extent to which their inner layers suffer from forgetting, and the effect of different CL approaches on each layer. Finally, our observations and analyses open up a number of important research questions that will inform and guide the design of effective continual learning techniques.

Chat is not available.