Poster
in
Workshop: ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems (MemAgents) Mon, Apr 27, 2026 • 10:50 AM – 11:35 AM PDT

Latent Action Reparameterization for Efficient Agent Inference

Qingwen Zeng ⋅ Wenhao Huang ⋅ Zerui Xu ⋅ Zijie Guo ⋅ Yu Sun ⋅ Cheng Yang ⋅ Siru Ouyang ⋅ Jiri Gesi ⋅ Fang Wu ⋅ Jiayi Zhang ⋅ Bang Liu ⋅ Chenglin Wu ⋅ Xiangru Tang

Project Page [ OpenReview]

Abstract

Large language model (LLM) agents often rely on long sequences of low-level textual actions, resulting in large effective decision horizons and high inference cost. While prior work has focused on improving inference efficiency through system-level optimizations or prompt engineering, we argue that a key bottleneck lies in the representation of the action space itself. We propose Latent Action Reparameterization (LAR), a framework that learns a compact latent action space in which each latent action corresponds to a multi-step semantic behavior. By reparameterizing agent actions into latent units, LAR enables decision making over a shorter effective horizon while preserving the expressiveness of the original action space. Unlike hand-crafted macros or hierarchical controllers, latent actions are learned from agent trajectories and integrated directly into the model, allowing both planning and execution to operate over abstract action representations. Across a range of LLM-based agent benchmarks, LAR significantly reduces the effective action horizon and improves inference efficiency under fixed compute budgets. As a consequence, our approach achieves substantial reductions in action tokens and corresponding wall-clock inference time, while maintaining or improving task success rates. These results suggest that action representation learning is a critical and underexplored factor in scaling efficient LLM agent inference, complementary to advances in model architecture and hardware.

Chat is not available.