Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

9 Results

<<   <   Page 1 of 1   >>   >
Workshop
CatCode: A Comprehensive Evaluation Framework for LLMs On the Mixture of Code and Text
Zhenru Lin · Yiqun Yao · Yang Yuan
Workshop
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
Shibo Hao · Yi Gu · Haotian Luo · Tianyang Liu · Xiyan Shao · Xinyuan Wang · Shuhua Xie · Haodi Ma · Adithya Samavedhi · Qiyue Gao · Zhen Wang · Zhiting Hu
Workshop
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness
Danna Zheng · Danyang Liu · Mirella Lapata · J Pan
Workshop
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness
Danna Zheng · Danyang Liu · Mirella Lapata · J Pan
Poster
Tue 1:45 PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Yidong Wang · Zhuohao Yu · Wenjin Yao · Zhengran Zeng · Linyi Yang · Cunxiang Wang · Hao Chen · Chaoya Jiang · Rui Xie · Jindong Wang · Xing Xie · Wei Ye · Shikun Zhang · Yue Zhang
Poster
Wed 1:45 ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan · Weize Chen · Yusheng Su · Jianxuan Yu · Wei Xue · Shanghang Zhang · Jie Fu · Zhiyuan Liu
Workshop
LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Game
Sahar Abdelnabi · Amr Gomaa · Sarath Sivaprasad · Lea Schönherr · Mario Fritz
Workshop
[***Online Presentation***] DELE: Data Efficient LLM Evaluation
Gayathri Saranathan · Mahammad Parwez Alam · JAMES LIM · Suparna Bhattacharya · Soon Wong · Martin Foltin · Cong Xu
Workshop
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Chang Ma · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He