ICLR On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval

Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models

On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval

Kaiyue Wen · Xingyu Dang · Kaifeng Lyu

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: This paper investigates the limitations of Recurrent Neural Networks (RNNs) in algorithmic tasks, particularly in comparison with Transformers. Focusing on a reasoning task IsTree deciding whether a graph is a tree, we demonstrate that RNNs with $o(n)$ parameters, even with Chain-of-Thought (CoT), cannot solve this task for graphs with size $n$, unlike Transformers which can solve the task with CoT and only $O(\log n)$ bit parameters. Our experiments confirm this representation gap. To overcome this limitation, we propose augmenting RNNs with in-context retrieval capabilities, specifically using regular expressions. This enhancement enables RNNs to solve IsTree and other algorithmic problems in $\mathsf{P}$, maintaining their memory efficiency and closing the gap with Transformers.

Chat is not available.

Poster in Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models

On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval

Kaiyue Wen · Xingyu Dang · Kaifeng Lyu

Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models