TextBO: Bayesian Optimization in Language Space for Eval-Efficient Self-Improving AI
Abstract
Large Language Models (LLMs) have enabled self-improving AI systems that iteratively generate, evaluate, and refine their outcomes. Recent studies show that prompt-optimization-based self-improvement can outperform state-of-the-art reinforcement-learning fine-tuning of LLMs, but performance is typically measured by \emph{generation} efficiency. However, in many applications, the constraint is \emph{evaluation} efficiency: obtaining reliable feedback is far more costly than generating candidates. In this paper, we propose \textsc{TextBO}, a self-improving algorithm that achieves evaluation-efficiency by provably emulating gradient-based UCB-BO in language space. We empirically validate \textsc{TextBO} on automated ad-alignment tasks agentic AI tasks, demonstrating superior performance per evaluation compared to \textsc{GEPA}. We also evaluate \textsc{TextBO}’s \textsc{Best‑of‑N} multi‑step textual‑gradient mechanism on agentic AI benchmarks by augmenting \textsc{GEPA} with it and show that it significantly outperforms standard \textsc{GEPA}. For the full paper access, refer to \href{https://arxiv.org/abs/2511.12063}{https://arxiv.org/abs/2511.12063}.