Skip to yearly menu bar Skip to main content


Poster

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo · Qingfeng Sun · Can Xu · Pu Zhao · Jian-Guang Lou · Chongyang Tao · Xiubo Geng · Qingwei Lin · Shifeng Chen · Yansong Tang · Dongmei Zhang

Hall 3 + Hall 2B #612
[ ]
Thu 24 Apr midnight PDT — 2:30 a.m. PDT
 
Oral presentation: Oral Session 1B
Wed 23 Apr 7:30 p.m. PDT — 9 p.m. PDT

Abstract:

Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of LLMs, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. Remarkably, WizardMath-Mistral 7B surpasses all other open-source LLMs by a substantial margin. Furthermore, WizardMath 70B even outperforms ChatGPT-3.5, Claude Instant, Gemini Pro and Mistral Medium. Additionally, our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance.

Live content is unavailable. Log in and register to view live content