Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning
Deqian Kong ⋅ Minglu Zhao ⋅ Aoyang Qin ⋅ Bo Pang ⋅ Chenxin Tao ⋅ David Hartmann ⋅ Edouardo Honig ⋅ Dehong Xu ⋅ Amit Kumar ⋅ Matthew Sarte ⋅ Chuan Li ⋅ Jianwen Xie ⋅ Yingnian Wu
Abstract
Standard chain-of-thought reasoning generates a solution in a single forward pass, committing irrevocably to each token and lacking a mechanism to recover from early errors. We introduce \emph{Inference-Time Rethinking}, a generative framework that enables iterative self-correction by decoupling declarative \textit{latent thought vectors} from procedural generation. We factorize reasoning into a continuous latent thought vector (what to reason about) and a decoder that verbalizes the trace conditioned on this vector (how to reason). Beyond serving as a declarative buffer, latent thought vectors compress the reasoning structure into a continuous representation that abstracts away surface-level token variability, making gradient-based optimization over reasoning strategies well-posed. Our prior model maps unstructured noise to a learned manifold of valid reasoning patterns, and at test time we employ a Gibbs-style procedure that alternates between generating a candidate trace and optimizing the latent vector to better explain that trace, effectively navigating the latent manifold to refine the reasoning strategy. Training a 0.2B-parameter model from scratch on GSM8K, our method with 30 rethinking iterations surpasses baselines with 10--15$\times$ more parameters, including a 3B counterpart. This result demonstrates that effective mathematical reasoning can emerge from sophisticated inference-time computation rather than solely from massive parameter counts.
Chat is not available.
Successful Page Load