ICLR Poster Plug, Play, and Generalize: Length Extrapolation with Pointer-Augmented Neural Memory

Poster

Plug, Play, and Generalize: Length Extrapolation with Pointer-Augmented Neural Memory

Svetha Venkatesh · Kien Do · Hung Le · Truyen Tran · Dung Nguyen

Hall 3 + Hall 2B #130

[ Abstract ] [ Project Page ]

Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

We introduce Pointer-Augmented Neural Memory (PANM), a versatile module designed to enhance neural networks' ability to process symbols and extend their capabilities to longer data sequences. PANM integrates an external neural memory utilizing novel physical addresses and pointer manipulation techniques, emulating human and computer-like symbol processing abilities. PANM facilitates operations like pointer assignment, dereferencing, and arithmetic by explicitly employing physical pointers for memory access. This module can be trained end-to-end on sequence data, empowering various sequential models, from simple recurrent networks to large language models (LLMs). Our experiments showcase PANM's exceptional length extrapolation capabilities and its enhancement of recurrent neural networks in symbol processing tasks, including algorithmic reasoning and Dyck language recognition. PANM enables Transformers to achieve up to 100% generalization accuracy in compositional learning tasks and significantly improves performance in mathematical reasoning, question answering, and machine translation. Notably, the generalization effectiveness scales with stronger backbone models, as evidenced by substantial performance gains when we test LLMs finetuned with PANM for tasks up to 10-100 times longer than the training data.

Live content is unavailable. Log in and register to view live content