Birkhoff-Exact Hyper-Connections: Exact Spectral Stability for Deep Residual Networks
Hyunjun Kim
Abstract
Learnable information routing in deep networks faces the *depth-stability-efficiency trilemma*: architectures that scale to extreme depths often sacrifice efficiency; efficient approaches lack stability guarantees. Prior work uses iterative Sinkhorn-Knopp normalization to approximate doubly stochastic mixing matrices, but residual errors destabilize training beyond several hundred layers. We propose **Birkhoff-Exact Hyper-Connections (BE-HC)**, which leverages the Birkhoff-von Neumann theorem to construct *exactly* doubly stochastic matrices as convex combinations of permutation matrices. This guarantees spectral radius $\rho = 1$ exactly—not approximately—enabling stable training at unprecedented depths. **Key results:** (1) *Extreme depth:* BE-HC trains stably at **1000 layers**, achieving 35.71% accuracy where ReZero and other baselines fail to converge. (2) *Long context:* BE-HC handles **8K tokens** on a single V100 GPU (22.56% validation accuracy), while standard attention fails with out-of-memory errors. (3) *Efficiency:* 1.47× throughput improvement over attention at 4K context length. (4) *Robustness:* 4× better accuracy retention under INT8 quantization than attention. BE-HC resolves the trilemma: exact stability enables depth, permutation structure enables efficiency, and bounded Lipschitz constants enable deployment robustness.
Chat is not available.
Successful Page Load