Skip to yearly menu bar Skip to main content


Poster

STAR: Synthesis of Tailored Architectures

Armin Thomas · Rom Parnichkun · Alexander Amini · Stefano Massaroli · Michael Poli

Hall 3 + Hall 2B #232
[ ]
Fri 25 Apr midnight PDT — 2:30 a.m. PDT
 
Oral presentation: Oral Session 3C
Thu 24 Apr 7:30 p.m. PDT — 9 p.m. PDT

Abstract:

Iterative improvement of model architectures is fundamental to deep learning: Transformers first enabled scaling, and recent advances in model hybridization have pushed the quality-efficiency frontier. However, optimizing architectures remains challenging and expensive, with a variety of automated or manual approaches that fall short, due to limited progress in the design of search spaces and due to the simplicity of resulting patterns and heuristics. In this work, we propose a new approach for the synthesis of tailored architectures (STAR). Our approach combines a novel search space based on the theory of linear input-varying systems, supporting a hierarchical numerical encoding into architecture genomes. STAR genomes are automatically refined and recombined with gradient-free, evolutionary algorithms to optimize for multiple model quality and efficiency metrics. Using STAR, we optimize large populations of new architectures, leveraging diverse computational units and interconnection patterns, improving over highly-optimized Transformers and striped hybrid models on the frontier of quality, parameter size, and inference cache for autoregressive language modeling.

Live content is unavailable. Log in and register to view live content