Decomposing ARC Programs to Create Simpler Tasks
Abstract
We introduce a program-driven method to augment the ARC (Abstraction and Reasoning Corpus) training set by decomposing ground-truth DSL solutions at split points:locations where a function returns an intermediate grid that fully captures prior computation. From each split point, we synthesize two new tasks: (1) the left subprogram with the intermediate grid as the target, and (2) the right subprogram with that intermediate grid as the input. By recursively applying this procedure and filtering for variable- and dependency-safe splits, our pipeline produces tasks that are grounded in both the original ARC distribution and conceptually simpler than their parents. Applied to the original training set, our method yields 366 new unique tasks; when layered on top of Butt’s Codeit mutations (20,000 tasks), it produces 6,011 unique tasks (of which 3,634 are distinct programs). Generated tasks are shorter on average, consistent with lower DSL-program length being a proxy for reduced difficulty and include 3,310 left-programs, 2,701 right-programs, with 947 additional inter-split derivations from repeated application. Qualitative case studies show the decomposition often isolates natural ``mental steps" in ARC problems, suggesting a route to explicit curricula for solvers. We discuss limitations (current implementation only handles explicitly typed grid-returning functions and is conservative in splitting) and outline extensions (static type inference for the Hodel DSL, more aggressive program compression, and empirical evaluation of solver improvements). Our method complements existing task-generation techniques by producing interpretable, stepwise tasks that can help probe and train program- and model-based ARC solvers.