Understanding Cross-layer Contributions to Mixture-of-Experts Routing in LLMs
Abstract
Mixture-of-Experts (MoE) has been a prevalent method for scaling up large language models at a reduced computational cost. Despite its effectiveness, the routing mechanism of MoE still lacks a clear understanding from the perspective of cross-layer mechanistic interpretability. We propose a light-weight methodology at which we can break down the routing decision for MoE to the contribution of model components, in a recursive fashion. We use our methodology to dissect the routing mechanism by decomposing the input of routers into model components. We study how different model components contribute to the routing in different widely used open models. Our findings on four different LLMs reveal patterns such as: a) MoE layer outputs usually contribute more than attention layer outputs to the routing decisions of subsequent layers, b) MoE entanglement at which MoE firing up in layers consistently correlate with firing up of MoE in subsequent layers, and c) some components can persistently influence the routing in many following layers. Our study also includes findings on how different models have different patterns when it comes to long-range and short-range inhibiting/promoting effects that components can have over MoE in subsequent layers. Our results indicate the importance of quantifying the impact of components across different layers on MoE to understand the routing mechanism.