TTT3R: 3D Reconstruction as Test-Time Training
Xingyu Chen · Yue Chen · Yuliang Xiu · Andreas Geiger · Anpei Chen
Abstract
Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear complexity in the sequence length. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, enabling a balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a $2\times$ improvement in global pose estimation over baselines while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code will be made publicly available.
Successful Page Load