Workshop
|
|
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation
Seongyun Lee · Seungone Kim · Sue Park · Geewook Kim · Minjoon Seo
|
|
Workshop
|
|
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
Florian Eddie Dorner · Moritz Hardt
|
|
Workshop
|
|
Evaluating predictive patterns of antigen specific B cells by single cell transcriptome and antibody repertoire sequencing
Lena Erlach · Raphael Kuhn · Andreas Agrafiotis · Danielle Shlesinger · Alexander Yermanos · Sai Reddy
|
|
Workshop
|
|
Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications
Ricardo Knauer · Erik Rodner
|
|
Workshop
|
|
Evaluating Large Language Models in an Emerging Domain: A Pilot Study in Decentralized Finance
Joshua Pearlson · Xiaoyuan Liu · Chengsong Huang · Kripa George · Dawn Song · Chenguang Wang
|
|
Workshop
|
Sat 2:40
|
DARKIN: A zero-shot classification benchmark and an evaluation of protein language models
Emine Ayşe Sunar · Zeynep Işık · Mert Pekey · Ramazan Gokberk Cinbis · Oznur Tastan
|
|
Workshop
|
|
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness
Danna Zheng · Danyang Liu · Mirella Lapata · J Pan
|
|
Workshop
|
|
CatCode: A Comprehensive Evaluation Framework for LLMs On the Mixture of Code and Text
Zhenru Lin · Yiqun Yao · Yang Yuan
|
|
Workshop
|
|
Enhancing and Evaluating Logical Reasoning Abilities of Large Language Models
Shujie Deng · Honghua Dong · Xujie Si
|
|
Workshop
|
|
Enhancing and Evaluating Logical Reasoning Abilities of Large Language Models
Shujie Deng · Honghua Dong · Xujie Si
|
|
Workshop
|
|
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness
Danna Zheng · Danyang Liu · Mirella Lapata · J Pan
|
|
Workshop
|
|
On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models
Ken Liu · Zhoujie Ding · Berivan Isik · Sanmi Koyejo
|
|