Poster Sat, Apr 25, 2026 • 6:30 AM – 9:00 AM PDT Pavilion 3 P3-#1609

ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity

Xiaoyang Liu ⋅ Tao Zhu ⋅ Zineng Dong ⋅ Yuntian Liu ⋅ Guo qingfeng ⋅ Liu ZhaoXuan ⋅ Yu Chen ⋅ Tao Luo

Project Page [ OpenReview]

Abstract

Despite significant strides in statement autoformalization, a critical gap remains in the development of automated evaluation metrics capable of assessing formal translation quality. Existing metrics often fail to balance semantic and structural information: string-based methods neglect semantics, whereas proof-based approaches offer no graded similarity when proofs fail. To address these issues, we introduce ASSESS (A Semantic and Structural Evaluation Framework for Statement Similarity), which captures syntactic structure by transforming formal statements into operator trees and computes a real-valued similarity score using our novel TransTED (Transformation Tree Edit Distance) Similarity metric by incorporating semantic transformations. For rigorous validation, we present EPLA (Evaluating Provability and Likeness for Autoformalization), a benchmark comprising 1,247 expert-annotated formal statement pairs derived from miniF2F and ProofNet, distinctively labeled for both semantic provability and structural likeness. Experiments on the EPLA benchmark demonstrate that TransTED Similarity surpasses existing methods, achieving state-of-the-art accuracy and Kappa score. The benchmark dataset, code, and detailed experimental results are available at https://github.com/XiaoyangLiu-sjtu/ASSESS.

Video

Chat is not available.