Poster
in
Workshop: SCOPE: SCALABLE OPTIMIZATION FOR EFFICIENT AND ADPATIVE FOUNDATION MODELS
PENCIL: Long Thoughts with Short Memory
Chenxiao Yang · Nathan Srebro · David McAllester · Zhiyuan Li
Keywords: [ space efficiency ] [ large language models ] [ chain-of-thought ]
We introduce PENCIL to address the fundamental write-only limitation of CoT by incorporating a reduction mechanism into the generation process, which allows the model to automatically clean up intermediate computations that would otherwise accumulate indefinitely in the context. With the reduction mechanism, the maximal context length during generation is reduced from the time complexity for solving the problem, which is often exponential for hard tasks, to the actual space required, which is polynomial. By using space efficiently, PENCIL can generate longer thoughts using small memory and thus scale up to larger problems given more inference time budget. Notably, PENCIL achieves 97\% accuracy on the challenging Einstein's puzzle using a 25M parameter 2048 context length transformer, with average thinking time 43s. Theoretically, we show PENCIL can perform universal computation by simulating a Turing machine with optimal time and space complexity.