Poster
in
Workshop: SCOPE: SCALABLE OPTIMIZATION FOR EFFICIENT AND ADPATIVE FOUNDATION MODELS
Universal LLM Routing with Correctness-Based Representation
Wittawat Jitkrittum · Harikrishna Narasimhan · Ankit Singh Rawat · Jeevesh Juneja · Zifeng Wang · Chen-Yu Lee · Pradeep Shenoy · Rina Panigrahy · Aditya Krishna Menon · Sanjiv Kumar
Keywords: [ adaptive computation ] [ learning to defer ] [ routing ]
Large language models’ significant advances in capabilities are accompanied by significant increases in inference costs. Model routing is a simple technique for reducing inference cost, wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed pool of LLMs. In this paper, we consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representativeprompts. Based on this, we detail an effective strategy relying on cluster-based routing. We prove that the strategy is an estimate of a theoretically optimal routing rule. Experiments on a range of public benchmarks show the effectiveness of the proposal in routing amongst more than 30 unseen LLMs.