Poster
in
Workshop: Workshop on Reasoning and Planning for Large Language Models

PHYSICS: Benchmarking Foundation Models for PhD-Qualifying Exam Physics Problem Solving

Kaiyue Feng · Yilun Zhao · Yixin Liu · Tianyu Yang · Chen Zhao · John Sous · Arman Cohan

Project Page [ OpenReview]

Abstract

We introduce PHYSICS, a comprehensive benchmark for PhD-qualifying exam physics problem solving. PHYSICS contains 1297 expert-annotated problems covering six core fields: classical mechanics, quantum mechanics, thermodynamics and statistical mechanics, electromagnetism, atomics, and optics.Problem in these fields require physics professional knowledge and advanced mathematical reasoning.We develop a robust automated evaluation system for precise and reliable validation. Our assessment of leading foundation models reveals that even the most advanced models, DeepSeek-R1, achieve only 45.4% accuracy, highlighting significant gaps in their ability to solve high-level scientific problems.Our comprehensive analysis provides insights for targeted improvements for future advancements and underscore the importance of PHYSICS in pushing the boundaries of foundation models' capabilities in advanced scientific reasoning and problem-solving.

Chat is not available.