How to Train Your HRM
Abstract
Hierarchical Reasoning Models (HRMs) are a recently proposed model architecture for solving complex reasoning tasks such as the Abstract and Reasoning Corpus (ARC-AGI) challenge: the objective is to learn an underlying transformation, demonstrated by example input–output pairs. The HRM learns transformations via supervised learning on the demonstration pairs. Each task involves an entirely new transformation, necessitating test-time training on the evaluation tasks. We investigate training curricula for HRMs to compensate for limited test-time compute, focused on three stages: offline pre-training on available training data; test-time fine-tuning on evaluation tasks; test-time, per-task `overfitting', in which a specialized model is trained for each task. Our results suggest that pre-training can offer early gains, which may not persist, and that fine-tuning on all tasks (training and evaluation) is optimal. The majority of test-time compute should be spent on fine-tuning, rather than overfitting---typically 2:1 or more.