From Examples to Solutions: A Cognitive Framework for LLM Code Generation
Abstract
When learning to solve coding problems, humans rarely approach new challenges from scratch. Instead, they study worked examples that reveal solution patterns and edge cases. This cognitive strategy, well-documented in educational psychology, has been largely overlooked in training LLMs for code generation. In this work, we ask: can incorporating worked examples as explicit intermediate representations improve LLM code generation via reinforcement learning? We introduce COACH (COgnitive Abstraction Conditioning for code Help), a framework that decomposes code generation into two stages: an example generator that produces step-by-step solved examples for a given problem, and a solution generator that conditions on these examples to produce code. Both models are trained jointly using Group Relative Policy Optimization (GRPO), receiving the same execution-based reward signal. This shared reward structure incentivizes the example generator to produce examples that genuinely aid solution generation rather than superficial reasoning. On the MBPP benchmark, COACH achieves 49% pass@1 accuracy compared to 37% for vanilla GRPO - a 32% relative improvement. COACH also demonstrates improved sample efficiency, achieving comparable performance to the baseline with less than 2/5 of the training data. Qualitatively, we find that COACH’s intermediate representations help the model handle edge cases that end-to-end approaches miss. Our results suggest that explicitly modeling human reasoning patterns, specifically, the use of “worked examples” as reasoning scaffolds offers a promising direction for more effective and interpretable code generation systems.