firstbacksecondback
9 Results
Workshop
|
CatCode: A Comprehensive Evaluation Framework for LLMs On the Mixture of Code and Text Zhenru Lin · Yiqun Yao · Yang Yuan |
||
Workshop
|
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models Shibo Hao · Yi Gu · Haotian Luo · Tianyang Liu · Xiyan Shao · Xinyuan Wang · Shuhua Xie · Haodi Ma · Adithya Samavedhi · Qiyue Gao · Zhen Wang · Zhiting Hu |
||
Workshop
|
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness Danna Zheng · Danyang Liu · Mirella Lapata · J Pan |
||
Workshop
|
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness Danna Zheng · Danyang Liu · Mirella Lapata · J Pan |
||
Poster
|
Tue 1:45 |
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization Yidong Wang · Zhuohao Yu · Wenjin Yao · Zhengran Zeng · Linyi Yang · Cunxiang Wang · Hao Chen · Chaoya Jiang · Rui Xie · Jindong Wang · Xing Xie · Wei Ye · Shikun Zhang · Yue Zhang |
|
Poster
|
Wed 1:45 |
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate Chi-Min Chan · Weize Chen · Yusheng Su · Jianxuan Yu · Wei Xue · Shanghang Zhang · Jie Fu · Zhiyuan Liu |
|
Workshop
|
LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Game Sahar Abdelnabi · Amr Gomaa · Sarath Sivaprasad · Lea Schönherr · Mario Fritz |
||
Workshop
|
[***Online Presentation***] DELE: Data Efficient LLM Evaluation Gayathri Saranathan · Mahammad Parwez Alam · JAMES LIM · Suparna Bhattacharya · Soon Wong · Martin Foltin · Cong Xu |
||
Workshop
|
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents Chang Ma · Junlei Zhang · Zhihao Zhu · Cheng Yang · Yujiu Yang · Yaohui Jin · Zhenzhong Lan · Lingpeng Kong · Junxian He |