Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces
Abstract
Agentic systems operating over large tool ecosystems must plan and execute long-horizon workflows while learning from weak or non-verifiable supervision. Frontier models address this through scale and large context budgets, but small language models (SLMs) remain brittle: eager tool loading saturates context, execution errors compound, and sparse rewards limit learning. We introduce ATLAS, a reinforcement finetuning framework that enables SLMs to operate effectively in large toolspaces by learning both context acquisition and action execution. We treat context control and execution structure as learnable decisions, combining iterative tool loading with programmatic orchestration to bound context growth and stabilize trajectories. We further propose rubric-based reinforcement finetuning, decomposing task success into structured criteria to enable scalable training with small judge models. Across MCP benchmarks, these design choices yield large, consistent gains over generic RL baselines, allowing a 4B SLM to approach frontier-agent performance under substantially tighter parameter and context budgets.