Workshop
Secure and Trustworthy Large Language Models
Yisen Wang 路 Ting Wang 路 Jinghui Chen 路 Chaowei Xiao 路 Jieyu Zhao 路 Nanyun (Violet) Peng 路 Yulia Tsvetkov 路 Anima Anandkumar
Schubert 5
Sat 11 May, midnight PDT
Large Language Models (LLMs) have emerged as transformative tools in natural language processing, redefining benchmarks across tasks from machine translation to dialog systems. However, with these advancements come intricate challenges centered around the security, transparency, and ethical dimensions of LLMs. These challenges, ranging from biases and misinformation dissemination to vulnerabilities against sophisticated attacks, have garnered considerable research attention. Our proposed workshop seeks to shine a spotlight on these pivotal issues, focusing on a myriad of topics including, but not limited to, LLM reliability, interpretability, backdoor defenses, and emerging learning paradigms. This assembly aims to bridge gaps between academia and industry, offering a platform for rigorous discussion, collaborative brainstorming, and a showcase of the latest research breakthroughs. Through this endeavor, we aspire to pave a pathway towards more secure, transparent, and ethically-grounded developments in LLMs, underlining the importance of collaborative, cross-disciplinary efforts in the process.
Schedule
Sat 12:00 a.m. - 12:10 a.m.
|
Opening remarks
(
Opening remarks
)
>
|
馃敆 |
Sat 12:10 a.m. - 12:40 a.m.
|
Invited Talk 1 -Tatsu Hashimoto
SlidesLive Video |
馃敆 |
Sat 12:40 a.m. - 12:50 a.m.
|
Oral Paper Presentation 1
SlidesLive Video |
馃敆 |
Sat 12:50 a.m. - 1:00 a.m.
|
Oral Paper Presentation 2
SlidesLive Video |
馃敆 |
Sat 1:00 a.m. - 1:30 a.m.
|
Invited Talk 2 - Graham Neubig
SlidesLive Video |
馃敆 |
Sat 1:30 a.m. - 1:40 a.m.
|
Oral Paper Presentation 3
SlidesLive Video |
馃敆 |
Sat 1:40 a.m. - 1:50 a.m.
|
Oral Paper Presentation 4
SlidesLive Video |
馃敆 |
Sat 1:50 a.m. - 3:00 a.m.
|
Poster Session A
|
馃敆 |
Sat 3:00 a.m. - 4:00 a.m.
|
Lunch break
|
馃敆 |
Sat 4:00 a.m. - 4:30 a.m.
|
Invited Talk 3 - Bo Li
SlidesLive Video |
馃敆 |
Sat 4:30 a.m. - 5:00 a.m.
|
Invited Talk 4 - Robin Jia
SlidesLive Video |
馃敆 |
Sat 5:00 a.m. - 5:30 a.m.
|
Invited Talk 5 - Tom Goldstein
SlidesLive Video |
馃敆 |
Sat 5:30 a.m. - 6:00 a.m.
|
Invited Talk 6 - Chaowei Xiao
SlidesLive Video |
馃敆 |
Sat 6:00 a.m. - 6:30 a.m.
|
Invited Talk 7 - Eric Wallace
SlidesLive Video |
馃敆 |
Sat 6:30 a.m. - 6:45 a.m.
|
Oral Paper Presentation 5
SlidesLive Video |
馃敆 |
Sat 6:45 a.m. - 7:00 a.m.
|
Oral Paper Presentation 6
SlidesLive Video |
馃敆 |
Sat 7:00 a.m. - 7:50 a.m.
|
Poster Session B
|
馃敆 |
Sat 7:50 a.m. - 8:00 a.m.
|
Closing Remarks
|
馃敆 |
-
|
Group Preference Optimization: Few-Shot Alignment of Large Language Models ( Poster ) > link | Siyan Zhao 路 John Dang 路 Aditya Grover 馃敆 |
-
|
Group Preference Optimization: Few-Shot Alignment of Large Language Models ( Oral ) > link | Siyan Zhao 路 John Dang 路 Aditya Grover 馃敆 |
-
|
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression ( Poster ) > link |
15 presentersJunyuan Hong 路 Jinhao Duan 路 Chenhui Zhang 路 Zhangheng LI 路 Chulin Xie 路 Kelsey Lieberman 路 James Diffenderfer 路 Brian Bartoldson 路 AJAY JAISWAL 路 Kaidi Xu 路 Bhavya Kailkhura 路 Dan Hendrycks 路 Dawn Song 路 Zhangyang Wang 路 Bo Li |
-
|
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression ( Oral ) > link |
15 presentersJunyuan Hong 路 Jinhao Duan 路 Chenhui Zhang 路 Zhangheng LI 路 Chulin Xie 路 Kelsey Lieberman 路 James Diffenderfer 路 Brian Bartoldson 路 AJAY JAISWAL 路 Kaidi Xu 路 Bhavya Kailkhura 路 Dan Hendrycks 路 Dawn Song 路 Zhangyang Wang 路 Bo Li |
-
|
Leveraging Context in Jailbreaking Attacks ( Poster ) > link | Yixin Cheng 路 Markos Georgopoulos 路 Volkan Cevher 路 Grigorios Chrysos 馃敆 |
-
|
Leveraging Context in Jailbreaking Attacks ( Oral ) > link | Yixin Cheng 路 Markos Georgopoulos 路 Volkan Cevher 路 Grigorios Chrysos 馃敆 |
-
|
Self-Alignment of Large Language Models via Social Scene Simulation ( Poster ) > link | Xianghe Pang 路 Shuo Tang 路 Rui Ye 路 Yuxin Xiong 路 Bolun Zhang 路 Yanfeng Wang 路 Siheng Chen 馃敆 |
-
|
Self-Alignment of Large Language Models via Social Scene Simulation ( Oral ) > link | Xianghe Pang 路 Shuo Tang 路 Rui Ye 路 Yuxin Xiong 路 Bolun Zhang 路 Yanfeng Wang 路 Siheng Chen 馃敆 |
-
|
Initial Response Selection for Prompt Jailbreaking using Model Steering ( Poster ) > link | Thien Tran 路 Koki Wataoka 路 Tsubasa Takahashi 馃敆 |
-
|
Initial Response Selection for Prompt Jailbreaking using Model Steering ( Oral ) > link | Thien Tran 路 Koki Wataoka 路 Tsubasa Takahashi 馃敆 |
-
|
Are Large Language Models Bayesian? A Martingale Perspective on In-Context Learning ( Poster ) > link | Fabian Falck 路 Ziyu Wang 路 Christopher Holmes 馃敆 |
-
|
Are Large Language Models Bayesian? A Martingale Perspective on In-Context Learning ( Oral ) > link | Fabian Falck 路 Ziyu Wang 路 Christopher Holmes 馃敆 |
-
|
Attacks on Third-Party APIs of Large Language Models ( Poster ) > link | Wanru Zhao 路 Vidit Khazanchi 路 Haodi Xing 路 Xuanli He 路 Qiongkai Xu 路 Nic Lane 馃敆 |
-
|
Attacks on Third-Party APIs of Large Language Models ( Oral ) > link | Wanru Zhao 路 Vidit Khazanchi 路 Haodi Xing 路 Xuanli He 路 Qiongkai Xu 路 Nic Lane 馃敆 |
-
|
How Susceptible are Large Language Models to Ideological Manipulation? ( Poster ) > link | Kai Chen 路 Zihao He 路 Jun Yan 路 Taiwei Shi 路 Kristina Lerman 馃敆 |
-
|
How Susceptible are Large Language Models to Ideological Manipulation? ( Oral ) > link | Kai Chen 路 Zihao He 路 Jun Yan 路 Taiwei Shi 路 Kristina Lerman 馃敆 |
-
|
Enhancing and Evaluating Logical Reasoning Abilities of Large Language Models ( Poster ) > link | Shujie Deng 路 Honghua Dong 路 Xujie Si 馃敆 |
-
|
Enhancing and Evaluating Logical Reasoning Abilities of Large Language Models ( Oral ) > link | Shujie Deng 路 Honghua Dong 路 Xujie Si 馃敆 |
-
|
Preventing Memorized Completions through White-Box Filtering ( Poster ) > link | Oam Patel 路 Rowan Wang 馃敆 |
-
|
Preventing Memorized Completions through White-Box Filtering ( Oral ) > link | Oam Patel 路 Rowan Wang 馃敆 |
-
|
Safer-Instruct: Aligning Language Models with Automated Preference Data ( Poster ) > link | Taiwei Shi 路 Kai Chen 路 Jieyu Zhao 馃敆 |
-
|
Safer-Instruct: Aligning Language Models with Automated Preference Data ( Oral ) > link | Taiwei Shi 路 Kai Chen 路 Jieyu Zhao 馃敆 |
-
|
Tailoring Self-Rationalizers with Multi-Reward Distillation ( Poster ) > link | Sahana Ramnath 路 Brihi Joshi 路 Skyler Hallinan 路 Ximing Lu 路 Liunian Li 路 Aaron Chan 路 Jack Hessel 路 Yejin Choi 路 Xiang Ren 馃敆 |
-
|
Tailoring Self-Rationalizers with Multi-Reward Distillation ( Oral ) > link | Sahana Ramnath 路 Brihi Joshi 路 Skyler Hallinan 路 Ximing Lu 路 Liunian Li 路 Aaron Chan 路 Jack Hessel 路 Yejin Choi 路 Xiang Ren 馃敆 |
-
|
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models ( Poster ) > link | Haibo Jin 路 Ruoxi Chen 路 Andy Zhou 路 Yang Zhang 路 Haohan Wang 馃敆 |
-
|
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models ( Oral ) > link | Haibo Jin 路 Ruoxi Chen 路 Andy Zhou 路 Yang Zhang 路 Haohan Wang 馃敆 |
-
|
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models ( Poster ) > link | Yuancheng Xu 路 Jiarui Yao 路 Manli Shu 路 Yanchao Sun 路 Zichu Wu 路 Ning Yu 路 Tom Goldstein 路 Furong Huang 馃敆 |
-
|
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models ( Oral ) > link | Yuancheng Xu 路 Jiarui Yao 路 Manli Shu 路 Yanchao Sun 路 Zichu Wu 路 Ning Yu 路 Tom Goldstein 路 Furong Huang 馃敆 |
-
|
WinoViz: Probing Visual Properties of Objects Under Different States ( Poster ) > link | Woojeong Jin 路 Tejas Srinivasan 路 Jesse Thomason 路 Xiang Ren 馃敆 |
-
|
WinoViz: Probing Visual Properties of Objects Under Different States ( Oral ) > link | Woojeong Jin 路 Tejas Srinivasan 路 Jesse Thomason 路 Xiang Ren 馃敆 |
-
|
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness ( Poster ) > link | Danna Zheng 路 Danyang Liu 路 Mirella Lapata 路 J Pan 馃敆 |
-
|
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness ( Oral ) > link | Danna Zheng 路 Danyang Liu 路 Mirella Lapata 路 J Pan 馃敆 |
-
|
Fight Back Against Jailbreaking via Prompt Adversarial Tuning ( Poster ) > link | Yichuan Mo 路 Yuji Wang 路 Zeming Wei 路 Yisen Wang 馃敆 |
-
|
Fight Back Against Jailbreaking via Prompt Adversarial Tuning ( Oral ) > link | Yichuan Mo 路 Yuji Wang 路 Zeming Wei 路 Yisen Wang 馃敆 |
-
|
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts ( Poster ) > link |
12 presentersMikayel Samvelyan 路 Sharath Raparthy 路 Andrei Lupu 路 Eric Hambro 路 Aram Markosyan 路 Manish Bhatt 路 Yuning Mao 路 Minqi Jiang 路 Jack Parker-Holder 路 Jakob Foerster 路 Tim Rocktaeschel 路 Roberta Raileanu |
-
|
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts ( Oral ) > link |
12 presentersMikayel Samvelyan 路 Sharath Raparthy 路 Andrei Lupu 路 Eric Hambro 路 Aram Markosyan 路 Manish Bhatt 路 Yuning Mao 路 Minqi Jiang 路 Jack Parker-Holder 路 Jakob Foerster 路 Tim Rocktaeschel 路 Roberta Raileanu |
-
|
A closer look at adversarial suffix learning for Jailbreaking LLMs ( Poster ) > link | Zhe Wang 路 Yanjun Qi 馃敆 |
-
|
A closer look at adversarial suffix learning for Jailbreaking LLMs ( Oral ) > link | Zhe Wang 路 Yanjun Qi 馃敆 |
-
|
Exploring the Adversarial Capabilities of Large Language Models ( Poster ) > link | Lukas Struppek 路 Minh Le 路 Dominik Hintersdorf 路 Kristian Kersting 馃敆 |
-
|
Exploring the Adversarial Capabilities of Large Language Models ( Oral ) > link | Lukas Struppek 路 Minh Le 路 Dominik Hintersdorf 路 Kristian Kersting 馃敆 |
-
|
Simple Permutations Can Fool LLaMA: Permutation Attack and Defense for Large Language Models ( Poster ) > link | Liang Chen 路 Yatao Bian 路 Li Shen 路 Kam-Fai Wong 馃敆 |
-
|
Simple Permutations Can Fool LLaMA: Permutation Attack and Defense for Large Language Models ( Oral ) > link | Liang Chen 路 Yatao Bian 路 Li Shen 路 Kam-Fai Wong 馃敆 |
-
|
On Prompt-Driven Safeguarding for Large Language Models ( Poster ) > link | Chujie Zheng 路 Fan Yin 路 Hao Zhou 路 Fandong Meng 路 Jie Zhou 路 Kai-Wei Chang 路 Minlie Huang 路 Nanyun (Violet) Peng 馃敆 |
-
|
On Prompt-Driven Safeguarding for Large Language Models ( Oral ) > link | Chujie Zheng 路 Fan Yin 路 Hao Zhou 路 Fandong Meng 路 Jie Zhou 路 Kai-Wei Chang 路 Minlie Huang 路 Nanyun (Violet) Peng 馃敆 |
-
|
Differentially Private Synthetic Data via Foundation Model APIs 2: Text ( Poster ) > link |
12 presentersChulin Xie 路 Zinan Lin 路 Arturs Backurs 路 Sivakanth Gopi 路 Da Yu 路 Huseyin Inan 路 Harsha Nori 路 Haotian Jiang 路 Huishuai Zhang 路 Yin Tat Lee 路 Bo Li 路 Sergey Yekhanin |
-
|
Differentially Private Synthetic Data via Foundation Model APIs 2: Text ( Oral ) > link |
12 presentersChulin Xie 路 Zinan Lin 路 Arturs Backurs 路 Sivakanth Gopi 路 Da Yu 路 Huseyin Inan 路 Harsha Nori 路 Haotian Jiang 路 Huishuai Zhang 路 Yin Tat Lee 路 Bo Li 路 Sergey Yekhanin |
-
|
Watermark Stealing in Large Language Models ( Poster ) > link | Nikola Jovanovi膰 路 Robin Staab 路 Martin Vechev 馃敆 |
-
|
Watermark Stealing in Large Language Models ( Oral ) > link | Nikola Jovanovi膰 路 Robin Staab 路 Martin Vechev 馃敆 |
-
|
WatME: Towards Lossless Watermarking Through Lexical Redundancy ( Poster ) > link | Liang Chen 路 Yatao Bian 路 Yang Deng 路 Deng Cai 路 Shuaiyi Li 路 Peilin Zhao 路 Kam-Fai Wong 馃敆 |
-
|
WatME: Towards Lossless Watermarking Through Lexical Redundancy ( Oral ) > link | Liang Chen 路 Yatao Bian 路 Yang Deng 路 Deng Cai 路 Shuaiyi Li 路 Peilin Zhao 路 Kam-Fai Wong 馃敆 |
-
|
I'm not familiar with the name Harry Potter: Prompting Baselines for Unlearning in LLMs ( Poster ) > link | Pratiksha Thaker 路 Yash Maurya 路 Virginia Smith 馃敆 |
-
|
I'm not familiar with the name Harry Potter: Prompting Baselines for Unlearning in LLMs ( Oral ) > link | Pratiksha Thaker 路 Yash Maurya 路 Virginia Smith 馃敆 |
-
|
What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety ( Poster ) > link | Luxi He 路 Mengzhou Xia 路 Peter Henderson 馃敆 |
-
|
What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety ( Oral ) > link | Luxi He 路 Mengzhou Xia 路 Peter Henderson 馃敆 |
-
|
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications ( Poster ) > link | Boyi Wei 路 Kaixuan Huang 路 Yangsibo Huang 路 Tinghao Xie 路 Xiangyu Qi 路 Mengzhou Xia 路 Prateek Mittal 路 Mengdi Wang 路 Peter Henderson 馃敆 |
-
|
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications ( Oral ) > link | Boyi Wei 路 Kaixuan Huang 路 Yangsibo Huang 路 Tinghao Xie 路 Xiangyu Qi 路 Mengzhou Xia 路 Prateek Mittal 路 Mengdi Wang 路 Peter Henderson 馃敆 |
-
|
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs ( Poster ) > link | Fengqing Jiang 路 Zhangchen Xu 路 Luyao Niu 路 Zhen Xiang 路 Bhaskar Ramasubramanian 路 Bo Li 路 Radha Poovendran 馃敆 |
-
|
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs ( Oral ) > link | Fengqing Jiang 路 Zhangchen Xu 路 Luyao Niu 路 Zhen Xiang 路 Bhaskar Ramasubramanian 路 Bo Li 路 Radha Poovendran 馃敆 |
-
|
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks ( Poster ) > link | Andy Zhou 路 Bo Li 路 Haohan Wang 馃敆 |
-
|
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks ( Oral ) > link | Andy Zhou 路 Bo Li 路 Haohan Wang 馃敆 |
-
|
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks ( Poster ) > link | Samyak Jain 路 Robert Kirk 路 Ekdeep Singh Lubana 路 Robert Dick 路 Hidenori Tanaka 路 Edward Grefenstette 路 Tim Rocktaeschel 路 David Krueger 馃敆 |
-
|
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks ( Oral ) > link | Samyak Jain 路 Robert Kirk 路 Ekdeep Singh Lubana 路 Robert Dick 路 Hidenori Tanaka 路 Edward Grefenstette 路 Tim Rocktaeschel 路 David Krueger 馃敆 |
-
|
Bayesian reward models for LLM alignment ( Poster ) > link | Adam Yang 路 Maxime Robeyns 路 Thomas Coste 路 Jun Wang 路 Haitham Bou Ammar 路 Laurence Aitchison 馃敆 |
-
|
Bayesian reward models for LLM alignment ( Oral ) > link | Adam Yang 路 Maxime Robeyns 路 Thomas Coste 路 Jun Wang 路 Haitham Bou Ammar 路 Laurence Aitchison 馃敆 |
-
|
Character-level robustness should be revisited ( Poster ) > link | Elias Abad Rocamora 路 Yongtao Wu 路 Fanghui Liu 路 Grigorios Chrysos 路 Volkan Cevher 馃敆 |
-
|
Character-level robustness should be revisited ( Oral ) > link | Elias Abad Rocamora 路 Yongtao Wu 路 Fanghui Liu 路 Grigorios Chrysos 路 Volkan Cevher 馃敆 |
-
|
Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task ( Poster ) > link | Jannik Brinkmann 路 Abhay Sheshadri 路 Victor Levoso 路 Paul Swoboda 路 Christian Bartelt 馃敆 |
-
|
Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task ( Oral ) > link | Jannik Brinkmann 路 Abhay Sheshadri 路 Victor Levoso 路 Paul Swoboda 路 Christian Bartelt 馃敆 |
-
|
Coercing LLMs to do and reveal (almost) anything ( Poster ) > link | Jonas Geiping 路 Alex Stein 路 Manli Shu 路 Khalid Saifullah 路 Yuxin Wen 路 Tom Goldstein 馃敆 |
-
|
Coercing LLMs to do and reveal (almost) anything ( Oral ) > link | Jonas Geiping 路 Alex Stein 路 Manli Shu 路 Khalid Saifullah 路 Yuxin Wen 路 Tom Goldstein 馃敆 |
-
|
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B ( Poster ) > link | Simon Lermen 路 Charlie Rogers-Smith 馃敆 |
-
|
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B ( Oral ) > link | Simon Lermen 路 Charlie Rogers-Smith 馃敆 |
-
|
Assessing Prompt Injection Risks in 200+ Custom GPTs ( Poster ) > link | Jiahao Yu 路 Yuhang Wu 路 Dong Shu 路 Mingyu Jin 路 Sabrina Yang 路 Xinyu Xing 馃敆 |
-
|
Assessing Prompt Injection Risks in 200+ Custom GPTs ( Oral ) > link | Jiahao Yu 路 Yuhang Wu 路 Dong Shu 路 Mingyu Jin 路 Sabrina Yang 路 Xinyu Xing 馃敆 |
-
|
DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization ( Poster ) > link | Xiaoyu Ye 路 Hao Huang 路 Jiaqi An 路 Yongtao Wang 馃敆 |
-
|
DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization ( Oral ) > link | Xiaoyu Ye 路 Hao Huang 路 Jiaqi An 路 Yongtao Wang 馃敆 |
-
|
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? ( Poster ) > link | Shuo Chen 路 Zhen Han 路 Bailan He 路 Zifeng Ding 路 Wenqian Yu 路 Philip Torr 路 Volker Tresp 路 Jindong Gu 馃敆 |
-
|
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? ( Oral ) > link | Shuo Chen 路 Zhen Han 路 Bailan He 路 Zifeng Ding 路 Wenqian Yu 路 Philip Torr 路 Volker Tresp 路 Jindong Gu 馃敆 |
-
|
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding ( Poster ) > link | Zhangchen Xu 路 Fengqing Jiang 路 Luyao Niu 路 Jinyuan Jia 路 Bill Yuchen Lin 路 Radha Poovendran 馃敆 |
-
|
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding ( Oral ) > link | Zhangchen Xu 路 Fengqing Jiang 路 Luyao Niu 路 Jinyuan Jia 路 Bill Yuchen Lin 路 Radha Poovendran 馃敆 |
-
|
Retrieval Augmented Prompt Optimization ( Poster ) > link | Yifan Sun 路 Jean-Baptiste Tien 路 Karthik lakshmanan 馃敆 |
-
|
Retrieval Augmented Prompt Optimization ( Oral ) > link | Yifan Sun 路 Jean-Baptiste Tien 路 Karthik lakshmanan 馃敆 |
-
|
Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation ( Poster ) > link | Yixin Wan 路 Fanyou Wu 路 Weijie Xu 路 Srinivasan Sengamedu 馃敆 |
-
|
Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation ( Oral ) > link | Yixin Wan 路 Fanyou Wu 路 Weijie Xu 路 Srinivasan Sengamedu 馃敆 |
-
|
On Trojan Signatures in Large Language Models of Code ( Poster ) > link | Aftab Hussain 路 Md Rafiqul Islam Rabin 路 Amin Alipour 馃敆 |
-
|
On Trojan Signatures in Large Language Models of Code ( Oral ) > link | Aftab Hussain 路 Md Rafiqul Islam Rabin 路 Amin Alipour 馃敆 |
-
|
Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework ( Poster ) > link | Jingling Li 路 Zeyu Tang 路 Xiaoyu Liu 路 Peter Spirtes 路 Kun Zhang 路 Liu Leqi 路 Yang Liu 馃敆 |
-
|
Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework ( Oral ) > link | Jingling Li 路 Zeyu Tang 路 Xiaoyu Liu 路 Peter Spirtes 路 Kun Zhang 路 Liu Leqi 路 Yang Liu 馃敆 |
-
|
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations ( Poster ) > link | Katie Matton 路 Robert Ness 路 Emre Kiciman 馃敆 |
-
|
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations ( Oral ) > link | Katie Matton 路 Robert Ness 路 Emre Kiciman 馃敆 |
-
|
On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models ( Poster ) > link | Ken Liu 路 Zhoujie Ding 路 Berivan Isik 路 Sanmi Koyejo 馃敆 |
-
|
On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models ( Oral ) > link | Ken Liu 路 Zhoujie Ding 路 Berivan Isik 路 Sanmi Koyejo 馃敆 |
-
|
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing ( Poster ) > link | Jiamu Zheng 路 Jinghuai Zhang 路 Futing Wang 路 Tianyu Du 路 Tao Lin 馃敆 |
-
|
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing ( Oral ) > link | Jiamu Zheng 路 Jinghuai Zhang 路 Futing Wang 路 Tianyu Du 路 Tao Lin 馃敆 |
-
|
MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs ( Poster ) > link | Yavuz Faruk Bakman 路 Duygu Nur Yaldiz 路 Baturalp Buyukates 路 Chenyang Tao 路 Dimitrios Dimitriadis 路 Salman Avestimehr 馃敆 |
-
|
MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs ( Oral ) > link | Yavuz Faruk Bakman 路 Duygu Nur Yaldiz 路 Baturalp Buyukates 路 Chenyang Tao 路 Dimitrios Dimitriadis 路 Salman Avestimehr 馃敆 |
-
|
Attacking LLM Watermarks by Exploiting Their Strengths ( Poster ) > link | Qi Pang 路 Shengyuan Hu 路 Wenting Zheng 路 Virginia Smith 馃敆 |
-
|
Attacking LLM Watermarks by Exploiting Their Strengths ( Oral ) > link | Qi Pang 路 Shengyuan Hu 路 Wenting Zheng 路 Virginia Smith 馃敆 |
-
|
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models ( Poster ) > link | Hanlin Zhang 路 Benjamin Edelman 路 Danilo Francati 路 Daniele Venturi 路 Giuseppe Ateniese 路 Boaz Barak 馃敆 |
-
|
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models ( Oral ) > link | Hanlin Zhang 路 Benjamin Edelman 路 Danilo Francati 路 Daniele Venturi 路 Giuseppe Ateniese 路 Boaz Barak 馃敆 |
-
|
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control ( Poster ) > link | Aleksandar Makelov 路 Georg Lange 路 Neel Nanda 馃敆 |
-
|
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control ( Oral ) > link | Aleksandar Makelov 路 Georg Lange 路 Neel Nanda 馃敆 |
-
|
Privacy-preserving Fine-tuning of Large Language Models through Flatness ( Poster ) > link | Tiejin Chen 路 Longchao Da 路 Huixue Zhou 路 Pingzhi Li 路 Kaixiong Zhou 路 Tianlong Chen 路 Hua Wei 馃敆 |
-
|
Privacy-preserving Fine-tuning of Large Language Models through Flatness ( Oral ) > link | Tiejin Chen 路 Longchao Da 路 Huixue Zhou 路 Pingzhi Li 路 Kaixiong Zhou 路 Tianlong Chen 路 Hua Wei 馃敆 |
-
|
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing ( Poster ) > link | Ruizhe Chen 路 Yichen Li 路 Zikai Xiao 路 Zuozhu Liu 馃敆 |
-
|
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing ( Oral ) > link | Ruizhe Chen 路 Yichen Li 路 Zikai Xiao 路 Zuozhu Liu 馃敆 |
-
|
Single-pass detection of jailbreaking input in large language models ( Poster ) > link | Leyla Naz Candogan 路 Yongtao Wu 路 Elias Abad Rocamora 路 Grigorios Chrysos 路 Volkan Cevher 馃敆 |
-
|
Single-pass detection of jailbreaking input in large language models ( Oral ) > link | Leyla Naz Candogan 路 Yongtao Wu 路 Elias Abad Rocamora 路 Grigorios Chrysos 路 Volkan Cevher 馃敆 |
-
|
How many Opinions does your LLM have? Improving Uncertainty Estimation in NLG ( Poster ) > link | Lukas Aichberger 路 Kajetan Schweighofer 路 Mykyta Ielanskyi 路 Sepp Hochreiter 馃敆 |
-
|
How many Opinions does your LLM have? Improving Uncertainty Estimation in NLG ( Oral ) > link | Lukas Aichberger 路 Kajetan Schweighofer 路 Mykyta Ielanskyi 路 Sepp Hochreiter 馃敆 |
-
|
Self-evaluation and self-prompting to improve the reliability of LLMs ( Poster ) > link | Alexandre Piche 路 Aristides Milios 路 Dzmitry Bahdanau 路 Christopher Pal 馃敆 |
-
|
Self-evaluation and self-prompting to improve the reliability of LLMs ( Oral ) > link | Alexandre Piche 路 Aristides Milios 路 Dzmitry Bahdanau 路 Christopher Pal 馃敆 |
-
|
BEYOND FINE-TUNING: LORA MODULES BOOST NEAR- OOD DETECTION AND LLM SECURITY ( Poster ) > link | Etienne Salimbeni 路 Francesco Craighero 路 Renata Khasanova 路 Milos Vasic 路 Pierre Vandergheynst 馃敆 |
-
|
BEYOND FINE-TUNING: LORA MODULES BOOST NEAR- OOD DETECTION AND LLM SECURITY ( Oral ) > link | Etienne Salimbeni 路 Francesco Craighero 路 Renata Khasanova 路 Milos Vasic 路 Pierre Vandergheynst 馃敆 |
-
|
PETA: PARAMETER-EFFICIENT TROJAN ATTACKS ( Poster ) > link | Lauren Hong 路 Ting Wang 馃敆 |
-
|
PETA: PARAMETER-EFFICIENT TROJAN ATTACKS ( Oral ) > link | Lauren Hong 路 Ting Wang 馃敆 |
-
|
Is Your Jailbreaking Prompt Truly Effective for Large Language Models? ( Poster ) > link | Bochuan Cao 路 Tianrong Zhang 路 Yuanpu Cao 路 Jinyuan Jia 路 Lu Lin 路 Jinghui Chen 馃敆 |
-
|
Is Your Jailbreaking Prompt Truly Effective for Large Language Models? ( Oral ) > link | Bochuan Cao 路 Tianrong Zhang 路 Yuanpu Cao 路 Jinyuan Jia 路 Lu Lin 路 Jinghui Chen 馃敆 |
-
|
TOFU: A Task of Fictitious Unlearning for LLMs ( Poster ) > link | Pratyush Maini 路 Zhili Feng 路 Avi Schwarzschild 路 Zachary Lipton 路 J Kolter 馃敆 |
-
|
TOFU: A Task of Fictitious Unlearning for LLMs ( Oral ) > link | Pratyush Maini 路 Zhili Feng 路 Avi Schwarzschild 路 Zachary Lipton 路 J Kolter 馃敆 |
-
|
The Effect of Model Size on LLM Post-hoc Explainability via LIME ( Poster ) > link | Henning Heyen 路 Amy Widdicombe 路 Noah Siegel 路 Philip Treleaven 路 Maria Perez-Ortiz 馃敆 |
-
|
The Effect of Model Size on LLM Post-hoc Explainability via LIME ( Oral ) > link | Henning Heyen 路 Amy Widdicombe 路 Noah Siegel 路 Philip Treleaven 路 Maria Perez-Ortiz 馃敆 |
-
|
Calibrating Language Models With Adaptive Temperature Scaling ( Poster ) > link | Johnathan Xie 路 Annie Chen 路 Yoonho Lee 路 Eric Mitchell 路 Chelsea Finn 馃敆 |
-
|
Calibrating Language Models With Adaptive Temperature Scaling ( Oral ) > link | Johnathan Xie 路 Annie Chen 路 Yoonho Lee 路 Eric Mitchell 路 Chelsea Finn 馃敆 |
-
|
Source-Aware Training Enables Knowledge Attribution in Language Models ( Poster ) > link | Muhammad Khalifa 路 David Wadden 路 Emma Strubell 路 Honglak Lee 路 Lu Wang 路 Iz Beltagy 路 Hao Peng 馃敆 |
-
|
Source-Aware Training Enables Knowledge Attribution in Language Models ( Oral ) > link | Muhammad Khalifa 路 David Wadden 路 Emma Strubell 路 Honglak Lee 路 Lu Wang 路 Iz Beltagy 路 Hao Peng 馃敆 |
-
|
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models ( Poster ) > link | Xianjun Yang 路 Xiao Wang 路 Qi Zhang 路 Linda Petzold 路 William Wang 路 XUN ZHAO 路 Dahua Lin 馃敆 |
-
|
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models ( Oral ) > link | Xianjun Yang 路 Xiao Wang 路 Qi Zhang 路 Linda Petzold 路 William Wang 路 XUN ZHAO 路 Dahua Lin 馃敆 |
-
|
PANDORA: Detailed LLM Jailbreaking via Collaborated Phishing Agents with Decomposed Reasoning ( Poster ) > link | Zhaorun Chen 路 Zhuokai Zhao 路 Wenjie Qu 路 zichen wen 路 Zhiguang Han 路 Zhihong Zhu 路 Jiaheng Zhang 路 Huaxiu Yao 馃敆 |
-
|
PANDORA: Detailed LLM Jailbreaking via Collaborated Phishing Agents with Decomposed Reasoning ( Oral ) > link | Zhaorun Chen 路 Zhuokai Zhao 路 Wenjie Qu 路 zichen wen 路 Zhiguang Han 路 Zhihong Zhu 路 Jiaheng Zhang 路 Huaxiu Yao 馃敆 |
-
|
Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks ( Poster ) > link | Aradhana Sinha 路 Ananth Balashankar 路 Ahmad Beirami 路 Thi Avrahami 路 Jilin Chen 路 Alex Beutel 馃敆 |
-
|
Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks ( Oral ) > link | Aradhana Sinha 路 Ananth Balashankar 路 Ahmad Beirami 路 Thi Avrahami 路 Jilin Chen 路 Alex Beutel 馃敆 |
-
|
Quantitative Certification of Knowledge Comprehension in LLMs ( Poster ) > link | Isha Chaudhary 路 Vedaant Jain 路 Gagandeep Singh 馃敆 |
-
|
Quantitative Certification of Knowledge Comprehension in LLMs ( Oral ) > link | Isha Chaudhary 路 Vedaant Jain 路 Gagandeep Singh 馃敆 |
-
|
Toward Robust Unlearning for LLMs ( Poster ) > link | Rishub Tamirisa 路 Bhrugu Bharathi 路 Andy Zhou 路 Bo Li 路 Mantas Mazeika 馃敆 |
-
|
Toward Robust Unlearning for LLMs ( Oral ) > link | Rishub Tamirisa 路 Bhrugu Bharathi 路 Andy Zhou 路 Bo Li 路 Mantas Mazeika 馃敆 |
-
|
Explorations of Self-Repair in Language Model ( Poster ) > link | Cody Rushing 路 Neel Nanda 馃敆 |
-
|
Explorations of Self-Repair in Language Model ( Oral ) > link | Cody Rushing 路 Neel Nanda 馃敆 |
-
|
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? ( Poster ) > link | Egor Zverev 路 Sahar Abdelnabi 路 Mario Fritz 路 Christoph Lampert 馃敆 |
-
|
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? ( Oral ) > link | Egor Zverev 路 Sahar Abdelnabi 路 Mario Fritz 路 Christoph Lampert 馃敆 |
-
|
An Assessment of Model-on-Model Deception ( Poster ) > link | Julius Heitkoetter 路 Michael Gerovitch 路 Laker Newhouse 馃敆 |
-
|
An Assessment of Model-on-Model Deception ( Oral ) > link | Julius Heitkoetter 路 Michael Gerovitch 路 Laker Newhouse 馃敆 |
-
|
Open Sesame! Universal Black-Box Jailbreaking of Large Language Models ( Poster ) > link | Raz Lapid 路 Ron Langberg 路 Moshe Sipper 馃敆 |
-
|
Open Sesame! Universal Black-Box Jailbreaking of Large Language Models ( Oral ) > link | Raz Lapid 路 Ron Langberg 路 Moshe Sipper 馃敆 |