Skip to yearly menu bar Skip to main content


Large Language Models as Tool Makers

Tianle Cai · Xuezhi Wang · Tengyu Ma · Xinyun Chen · Denny Zhou

Halle B #280
[ ]
Fri 10 May 7:30 a.m. PDT — 9:30 a.m. PDT


Recent research has highlighted the potential of large language models (LLMs)to improve their problem-solving capabilities with the aid of suitable externaltools. In our work, we further advance this concept by introducing a closed-loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMscreate their own reusable tools for problem-solving. Our approach consists of twophases: 1) tool making: an LLM acts as the tool maker that crafts tools for a setof tasks, where a tool is implemented as a Python utility function. 2) tool using:another LLM acts as the tool user, which applies the tool built by the tool makerfor problem-solving. The tool user can be either the same or a different LLMfrom the tool maker. On the problem-solving server side, tool-making enablescontinual tool generation and caching as new requests emerge. This frameworkenables subsequent requests to access cached tools via their corresponding APIs,enhancing the efficiency of task resolution. Beyond enabling LLMs to create theirown tools, our framework also uncovers intriguing opportunities to optimize theserving cost of LLMs: Recognizing that tool-making requires more sophisticatedcapabilities, we assign this task to a powerful, albeit resource-intensive, model.Conversely, the simpler tool-using phase is delegated to a lightweight model. Thisstrategic division of labor allows the once-off cost of tool-making to be spreadover multiple instances of tool-using, significantly reducing average costs whilemaintaining strong performance. Furthermore, our method offers a functionalcache through the caching and reuse of tools, which stores the functionality ofa class of requests instead of the natural language responses from LLMs, thusextending the applicability of the conventional cache mechanism. We evaluateour approach across various complex reasoning tasks, including Big-Bench tasks.With GPT-4 as the tool maker and GPT-3.5 as the tool user, LATM demonstratesperformance equivalent to using GPT-4 for both roles, but with a significantlyreduced inference cost.

Chat is not available.