firstbacksecondback
18 Results
Oral
|
Thu 1:00 |
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! Xiangyu Qi · Yi Zeng · Tinghao Xie · Pin-Yu Chen · Ruoxi Jia · Prateek Mittal · Peter Henderson |
|
Poster
|
Thu 1:45 |
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! Xiangyu Qi · Yi Zeng · Tinghao Xie · Pin-Yu Chen · Ruoxi Jia · Prateek Mittal · Peter Henderson |
|
Poster
|
Thu 7:30 |
FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling Yu Tian · Min Shi · Yan Luo · Ava Kouhana · Tobias Elze · Mengyu Wang |
|
Poster
|
Wed 7:30 |
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs Shashank Gupta · Vaishnavi Shrivastava · Ameet Deshpande · Ashwin Kalyan · Peter Clark · Ashish Sabharwal · Tushar Khot |
|
Poster
|
Thu 1:45 |
Quality-Diversity through AI Feedback Herbie Bradley · Andrew Dai · Hannah Teufel · Jenny Zhang · Koen Oostermeijer · Marco Bellagente · Jeff Clune · Kenneth Stanley · G. Schott · Joel Lehman |
|
Poster
|
Thu 7:30 |
Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models? Yu-Lin Tsai · Chia-Yi Hsu · Chulin Xie · Chih-Hsun Lin · Jia You Chen · Bo Li · Pin-Yu Chen · Chia-Mu Yu · Chun-Ying Huang |