Workshop
Secure and Trustworthy Large Language Models
Yisen Wang · Ting Wang · Jinghui Chen · Chaowei Xiao · Jieyu Zhao · Nanyun (Violet) Peng · Yulia Tsvetkov · Anima Anandkumar
Schubert 5
Sat 11 May, midnight PDT
Large Language Models (LLMs) have emerged as transformative tools in natural language processing, redefining benchmarks across tasks from machine translation to dialog systems. However, with these advancements come intricate challenges centered around the security, transparency, and ethical dimensions of LLMs. These challenges, ranging from biases and misinformation dissemination to vulnerabilities against sophisticated attacks, have garnered considerable research attention. Our proposed workshop seeks to shine a spotlight on these pivotal issues, focusing on a myriad of topics including, but not limited to, LLM reliability, interpretability, backdoor defenses, and emerging learning paradigms. This assembly aims to bridge gaps between academia and industry, offering a platform for rigorous discussion, collaborative brainstorming, and a showcase of the latest research breakthroughs. Through this endeavor, we aspire to pave a pathway towards more secure, transparent, and ethically-grounded developments in LLMs, underlining the importance of collaborative, cross-disciplinary efforts in the process.
Schedule
Sat 12:00 a.m. - 12:10 a.m.
|
Opening remarks
(
Opening remarks
)
>
|
🔗 |
Sat 12:10 a.m. - 12:40 a.m.
|
Invited Talk 1 -Tatsu Hashimoto
SlidesLive Video |
🔗 |
Sat 12:40 a.m. - 12:50 a.m.
|
Oral Paper Presentation 1
SlidesLive Video |
🔗 |
Sat 12:50 a.m. - 1:00 a.m.
|
Oral Paper Presentation 2
SlidesLive Video |
🔗 |
Sat 1:00 a.m. - 1:30 a.m.
|
Invited Talk 2 - Graham Neubig
SlidesLive Video |
🔗 |
Sat 1:30 a.m. - 1:40 a.m.
|
Oral Paper Presentation 3
SlidesLive Video |
🔗 |
Sat 1:40 a.m. - 1:50 a.m.
|
Oral Paper Presentation 4
SlidesLive Video |
🔗 |
Sat 1:50 a.m. - 3:00 a.m.
|
Poster Session A
|
🔗 |
Sat 3:00 a.m. - 4:00 a.m.
|
Lunch break
|
🔗 |
Sat 4:00 a.m. - 4:30 a.m.
|
Invited Talk 3 - Bo Li
SlidesLive Video |
🔗 |
Sat 4:30 a.m. - 5:00 a.m.
|
Invited Talk 4 - Robin Jia
SlidesLive Video |
🔗 |
Sat 5:00 a.m. - 5:30 a.m.
|
Invited Talk 5 - Tom Goldstein
SlidesLive Video |
🔗 |
Sat 5:30 a.m. - 6:00 a.m.
|
Invited Talk 6 - Chaowei Xiao
SlidesLive Video |
🔗 |
Sat 6:00 a.m. - 6:30 a.m.
|
Invited Talk 7 - Eric Wallace
SlidesLive Video |
🔗 |
Sat 6:30 a.m. - 6:45 a.m.
|
Oral Paper Presentation 5
SlidesLive Video |
🔗 |
Sat 6:45 a.m. - 7:00 a.m.
|
Oral Paper Presentation 6
SlidesLive Video |
🔗 |
Sat 7:00 a.m. - 7:50 a.m.
|
Poster Session B
|
🔗 |
Sat 7:50 a.m. - 8:00 a.m.
|
Closing Remarks
|
🔗 |
-
|
Group Preference Optimization: Few-Shot Alignment of Large Language Models ( Poster ) > link | Siyan Zhao · John Dang · Aditya Grover 🔗 |
-
|
Group Preference Optimization: Few-Shot Alignment of Large Language Models ( Oral ) > link | Siyan Zhao · John Dang · Aditya Grover 🔗 |
-
|
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression ( Poster ) > link |
15 presentersJunyuan Hong · Jinhao Duan · Chenhui Zhang · Zhangheng LI · Chulin Xie · Kelsey Lieberman · James Diffenderfer · Brian Bartoldson · AJAY JAISWAL · Kaidi Xu · Bhavya Kailkhura · Dan Hendrycks · Dawn Song · Zhangyang Wang · Bo Li |
-
|
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression ( Oral ) > link |
15 presentersJunyuan Hong · Jinhao Duan · Chenhui Zhang · Zhangheng LI · Chulin Xie · Kelsey Lieberman · James Diffenderfer · Brian Bartoldson · AJAY JAISWAL · Kaidi Xu · Bhavya Kailkhura · Dan Hendrycks · Dawn Song · Zhangyang Wang · Bo Li |
-
|
Leveraging Context in Jailbreaking Attacks ( Poster ) > link | Yixin Cheng · Markos Georgopoulos · Volkan Cevher · Grigorios Chrysos 🔗 |
-
|
Leveraging Context in Jailbreaking Attacks ( Oral ) > link | Yixin Cheng · Markos Georgopoulos · Volkan Cevher · Grigorios Chrysos 🔗 |
-
|
Self-Alignment of Large Language Models via Social Scene Simulation ( Poster ) > link | Xianghe Pang · Shuo Tang · Rui Ye · Yuxin Xiong · Bolun Zhang · Yanfeng Wang · Siheng Chen 🔗 |
-
|
Self-Alignment of Large Language Models via Social Scene Simulation ( Oral ) > link | Xianghe Pang · Shuo Tang · Rui Ye · Yuxin Xiong · Bolun Zhang · Yanfeng Wang · Siheng Chen 🔗 |
-
|
Initial Response Selection for Prompt Jailbreaking using Model Steering ( Poster ) > link | Thien Tran · Koki Wataoka · Tsubasa Takahashi 🔗 |
-
|
Initial Response Selection for Prompt Jailbreaking using Model Steering ( Oral ) > link | Thien Tran · Koki Wataoka · Tsubasa Takahashi 🔗 |
-
|
Are Large Language Models Bayesian? A Martingale Perspective on In-Context Learning ( Poster ) > link | Fabian Falck · Ziyu Wang · Christopher Holmes 🔗 |
-
|
Are Large Language Models Bayesian? A Martingale Perspective on In-Context Learning ( Oral ) > link | Fabian Falck · Ziyu Wang · Christopher Holmes 🔗 |
-
|
Attacks on Third-Party APIs of Large Language Models ( Poster ) > link | Wanru Zhao · Vidit Khazanchi · Haodi Xing · Xuanli He · Qiongkai Xu · Nic Lane 🔗 |
-
|
Attacks on Third-Party APIs of Large Language Models ( Oral ) > link | Wanru Zhao · Vidit Khazanchi · Haodi Xing · Xuanli He · Qiongkai Xu · Nic Lane 🔗 |
-
|
How Susceptible are Large Language Models to Ideological Manipulation? ( Poster ) > link | Kai Chen · Zihao He · Jun Yan · Taiwei Shi · Kristina Lerman 🔗 |
-
|
How Susceptible are Large Language Models to Ideological Manipulation? ( Oral ) > link | Kai Chen · Zihao He · Jun Yan · Taiwei Shi · Kristina Lerman 🔗 |
-
|
Enhancing and Evaluating Logical Reasoning Abilities of Large Language Models ( Poster ) > link | Shujie Deng · Honghua Dong · Xujie Si 🔗 |
-
|
Enhancing and Evaluating Logical Reasoning Abilities of Large Language Models ( Oral ) > link | Shujie Deng · Honghua Dong · Xujie Si 🔗 |
-
|
Preventing Memorized Completions through White-Box Filtering ( Poster ) > link | Oam Patel · Rowan Wang 🔗 |
-
|
Preventing Memorized Completions through White-Box Filtering ( Oral ) > link | Oam Patel · Rowan Wang 🔗 |
-
|
Safer-Instruct: Aligning Language Models with Automated Preference Data ( Poster ) > link | Taiwei Shi · Kai Chen · Jieyu Zhao 🔗 |
-
|
Safer-Instruct: Aligning Language Models with Automated Preference Data ( Oral ) > link | Taiwei Shi · Kai Chen · Jieyu Zhao 🔗 |
-
|
Tailoring Self-Rationalizers with Multi-Reward Distillation ( Poster ) > link | Sahana Ramnath · Brihi Joshi · Skyler Hallinan · Ximing Lu · Liunian Li · Aaron Chan · Jack Hessel · Yejin Choi · Xiang Ren 🔗 |
-
|
Tailoring Self-Rationalizers with Multi-Reward Distillation ( Oral ) > link | Sahana Ramnath · Brihi Joshi · Skyler Hallinan · Ximing Lu · Liunian Li · Aaron Chan · Jack Hessel · Yejin Choi · Xiang Ren 🔗 |
-
|
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models ( Poster ) > link | Haibo Jin · Ruoxi Chen · Andy Zhou · Yang Zhang · Haohan Wang 🔗 |
-
|
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models ( Oral ) > link | Haibo Jin · Ruoxi Chen · Andy Zhou · Yang Zhang · Haohan Wang 🔗 |
-
|
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models ( Poster ) > link | Yuancheng Xu · Jiarui Yao · Manli Shu · Yanchao Sun · Zichu Wu · Ning Yu · Tom Goldstein · Furong Huang 🔗 |
-
|
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models ( Oral ) > link | Yuancheng Xu · Jiarui Yao · Manli Shu · Yanchao Sun · Zichu Wu · Ning Yu · Tom Goldstein · Furong Huang 🔗 |
-
|
WinoViz: Probing Visual Properties of Objects Under Different States ( Poster ) > link | Woojeong Jin · Tejas Srinivasan · Jesse Thomason · Xiang Ren 🔗 |
-
|
WinoViz: Probing Visual Properties of Objects Under Different States ( Oral ) > link | Woojeong Jin · Tejas Srinivasan · Jesse Thomason · Xiang Ren 🔗 |
-
|
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness ( Poster ) > link | Danna Zheng · Danyang Liu · Mirella Lapata · J Pan 🔗 |
-
|
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness ( Oral ) > link | Danna Zheng · Danyang Liu · Mirella Lapata · J Pan 🔗 |
-
|
Fight Back Against Jailbreaking via Prompt Adversarial Tuning ( Poster ) > link | Yichuan Mo · Yuji Wang · Zeming Wei · Yisen Wang 🔗 |
-
|
Fight Back Against Jailbreaking via Prompt Adversarial Tuning ( Oral ) > link | Yichuan Mo · Yuji Wang · Zeming Wei · Yisen Wang 🔗 |
-
|
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts ( Poster ) > link |
12 presentersMikayel Samvelyan · Sharath Raparthy · Andrei Lupu · Eric Hambro · Aram Markosyan · Manish Bhatt · Yuning Mao · Minqi Jiang · Jack Parker-Holder · Jakob Foerster · Tim Rocktaeschel · Roberta Raileanu |
-
|
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts ( Oral ) > link |
12 presentersMikayel Samvelyan · Sharath Raparthy · Andrei Lupu · Eric Hambro · Aram Markosyan · Manish Bhatt · Yuning Mao · Minqi Jiang · Jack Parker-Holder · Jakob Foerster · Tim Rocktaeschel · Roberta Raileanu |
-
|
A closer look at adversarial suffix learning for Jailbreaking LLMs ( Poster ) > link | Zhe Wang · Yanjun Qi 🔗 |
-
|
A closer look at adversarial suffix learning for Jailbreaking LLMs ( Oral ) > link | Zhe Wang · Yanjun Qi 🔗 |
-
|
Exploring the Adversarial Capabilities of Large Language Models ( Poster ) > link | Lukas Struppek · Minh Le · Dominik Hintersdorf · Kristian Kersting 🔗 |
-
|
Exploring the Adversarial Capabilities of Large Language Models ( Oral ) > link | Lukas Struppek · Minh Le · Dominik Hintersdorf · Kristian Kersting 🔗 |
-
|
Simple Permutations Can Fool LLaMA: Permutation Attack and Defense for Large Language Models ( Poster ) > link | Liang Chen · Yatao Bian · Li Shen · Kam-Fai Wong 🔗 |
-
|
Simple Permutations Can Fool LLaMA: Permutation Attack and Defense for Large Language Models ( Oral ) > link | Liang Chen · Yatao Bian · Li Shen · Kam-Fai Wong 🔗 |
-
|
On Prompt-Driven Safeguarding for Large Language Models ( Poster ) > link | Chujie Zheng · Fan Yin · Hao Zhou · Fandong Meng · Jie Zhou · Kai-Wei Chang · Minlie Huang · Nanyun (Violet) Peng 🔗 |
-
|
On Prompt-Driven Safeguarding for Large Language Models ( Oral ) > link | Chujie Zheng · Fan Yin · Hao Zhou · Fandong Meng · Jie Zhou · Kai-Wei Chang · Minlie Huang · Nanyun (Violet) Peng 🔗 |
-
|
Differentially Private Synthetic Data via Foundation Model APIs 2: Text ( Poster ) > link |
12 presentersChulin Xie · Zinan Lin · Arturs Backurs · Sivakanth Gopi · Da Yu · Huseyin Inan · Harsha Nori · Haotian Jiang · Huishuai Zhang · Yin Tat Lee · Bo Li · Sergey Yekhanin |
-
|
Differentially Private Synthetic Data via Foundation Model APIs 2: Text ( Oral ) > link |
12 presentersChulin Xie · Zinan Lin · Arturs Backurs · Sivakanth Gopi · Da Yu · Huseyin Inan · Harsha Nori · Haotian Jiang · Huishuai Zhang · Yin Tat Lee · Bo Li · Sergey Yekhanin |
-
|
Watermark Stealing in Large Language Models ( Poster ) > link | Nikola Jovanović · Robin Staab · Martin Vechev 🔗 |
-
|
Watermark Stealing in Large Language Models ( Oral ) > link | Nikola Jovanović · Robin Staab · Martin Vechev 🔗 |
-
|
WatME: Towards Lossless Watermarking Through Lexical Redundancy ( Poster ) > link | Liang Chen · Yatao Bian · Yang Deng · Deng Cai · Shuaiyi Li · Peilin Zhao · Kam-Fai Wong 🔗 |
-
|
WatME: Towards Lossless Watermarking Through Lexical Redundancy ( Oral ) > link | Liang Chen · Yatao Bian · Yang Deng · Deng Cai · Shuaiyi Li · Peilin Zhao · Kam-Fai Wong 🔗 |
-
|
I'm not familiar with the name Harry Potter: Prompting Baselines for Unlearning in LLMs ( Poster ) > link | Pratiksha Thaker · Yash Maurya · Virginia Smith 🔗 |
-
|
I'm not familiar with the name Harry Potter: Prompting Baselines for Unlearning in LLMs ( Oral ) > link | Pratiksha Thaker · Yash Maurya · Virginia Smith 🔗 |
-
|
What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety ( Poster ) > link | Luxi He · Mengzhou Xia · Peter Henderson 🔗 |
-
|
What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety ( Oral ) > link | Luxi He · Mengzhou Xia · Peter Henderson 🔗 |
-
|
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications ( Poster ) > link | Boyi Wei · Kaixuan Huang · Yangsibo Huang · Tinghao Xie · Xiangyu Qi · Mengzhou Xia · Prateek Mittal · Mengdi Wang · Peter Henderson 🔗 |
-
|
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications ( Oral ) > link | Boyi Wei · Kaixuan Huang · Yangsibo Huang · Tinghao Xie · Xiangyu Qi · Mengzhou Xia · Prateek Mittal · Mengdi Wang · Peter Henderson 🔗 |
-
|
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs ( Poster ) > link | Fengqing Jiang · Zhangchen Xu · Luyao Niu · Zhen Xiang · Bhaskar Ramasubramanian · Bo Li · Radha Poovendran 🔗 |
-
|
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs ( Oral ) > link | Fengqing Jiang · Zhangchen Xu · Luyao Niu · Zhen Xiang · Bhaskar Ramasubramanian · Bo Li · Radha Poovendran 🔗 |
-
|
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks ( Poster ) > link | Andy Zhou · Bo Li · Haohan Wang 🔗 |
-
|
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks ( Oral ) > link | Andy Zhou · Bo Li · Haohan Wang 🔗 |
-
|
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks ( Poster ) > link | Samyak Jain · Robert Kirk · Ekdeep Singh Lubana · Robert Dick · Hidenori Tanaka · Edward Grefenstette · Tim Rocktaeschel · David Krueger 🔗 |
-
|
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks ( Oral ) > link | Samyak Jain · Robert Kirk · Ekdeep Singh Lubana · Robert Dick · Hidenori Tanaka · Edward Grefenstette · Tim Rocktaeschel · David Krueger 🔗 |
-
|
Bayesian reward models for LLM alignment ( Poster ) > link | Adam Yang · Maxime Robeyns · Thomas Coste · Jun Wang · Haitham Bou Ammar · Laurence Aitchison 🔗 |
-
|
Bayesian reward models for LLM alignment ( Oral ) > link | Adam Yang · Maxime Robeyns · Thomas Coste · Jun Wang · Haitham Bou Ammar · Laurence Aitchison 🔗 |
-
|
Character-level robustness should be revisited ( Poster ) > link | Elias Abad Rocamora · Yongtao Wu · Fanghui Liu · Grigorios Chrysos · Volkan Cevher 🔗 |
-
|
Character-level robustness should be revisited ( Oral ) > link | Elias Abad Rocamora · Yongtao Wu · Fanghui Liu · Grigorios Chrysos · Volkan Cevher 🔗 |
-
|
Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task ( Poster ) > link | Jannik Brinkmann · Abhay Sheshadri · Victor Levoso · Paul Swoboda · Christian Bartelt 🔗 |
-
|
Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task ( Oral ) > link | Jannik Brinkmann · Abhay Sheshadri · Victor Levoso · Paul Swoboda · Christian Bartelt 🔗 |
-
|
Coercing LLMs to do and reveal (almost) anything ( Poster ) > link | Jonas Geiping · Alex Stein · Manli Shu · Khalid Saifullah · Yuxin Wen · Tom Goldstein 🔗 |
-
|
Coercing LLMs to do and reveal (almost) anything ( Oral ) > link | Jonas Geiping · Alex Stein · Manli Shu · Khalid Saifullah · Yuxin Wen · Tom Goldstein 🔗 |
-
|
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B ( Poster ) > link | Simon Lermen · Charlie Rogers-Smith 🔗 |
-
|
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B ( Oral ) > link | Simon Lermen · Charlie Rogers-Smith 🔗 |
-
|
Assessing Prompt Injection Risks in 200+ Custom GPTs ( Poster ) > link | Jiahao Yu · Yuhang Wu · Dong Shu · Mingyu Jin · Sabrina Yang · Xinyu Xing 🔗 |
-
|
Assessing Prompt Injection Risks in 200+ Custom GPTs ( Oral ) > link | Jiahao Yu · Yuhang Wu · Dong Shu · Mingyu Jin · Sabrina Yang · Xinyu Xing 🔗 |
-
|
DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization ( Poster ) > link | Xiaoyu Ye · Hao Huang · Jiaqi An · Yongtao Wang 🔗 |
-
|
DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization ( Oral ) > link | Xiaoyu Ye · Hao Huang · Jiaqi An · Yongtao Wang 🔗 |
-
|
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? ( Poster ) > link | Shuo Chen · Zhen Han · Bailan He · Zifeng Ding · Wenqian Yu · Philip Torr · Volker Tresp · Jindong Gu 🔗 |
-
|
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? ( Oral ) > link | Shuo Chen · Zhen Han · Bailan He · Zifeng Ding · Wenqian Yu · Philip Torr · Volker Tresp · Jindong Gu 🔗 |
-
|
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding ( Poster ) > link | Zhangchen Xu · Fengqing Jiang · Luyao Niu · Jinyuan Jia · Bill Yuchen Lin · Radha Poovendran 🔗 |
-
|
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding ( Oral ) > link | Zhangchen Xu · Fengqing Jiang · Luyao Niu · Jinyuan Jia · Bill Yuchen Lin · Radha Poovendran 🔗 |
-
|
Retrieval Augmented Prompt Optimization ( Poster ) > link | Yifan Sun · Jean-Baptiste Tien · Karthik lakshmanan 🔗 |
-
|
Retrieval Augmented Prompt Optimization ( Oral ) > link | Yifan Sun · Jean-Baptiste Tien · Karthik lakshmanan 🔗 |
-
|
Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation ( Poster ) > link | Yixin Wan · Fanyou Wu · Weijie Xu · Srinivasan Sengamedu 🔗 |
-
|
Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation ( Oral ) > link | Yixin Wan · Fanyou Wu · Weijie Xu · Srinivasan Sengamedu 🔗 |
-
|
On Trojan Signatures in Large Language Models of Code ( Poster ) > link | Aftab Hussain · Md Rafiqul Islam Rabin · Amin Alipour 🔗 |
-
|
On Trojan Signatures in Large Language Models of Code ( Oral ) > link | Aftab Hussain · Md Rafiqul Islam Rabin · Amin Alipour 🔗 |
-
|
Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework ( Poster ) > link | Jingling Li · Zeyu Tang · Xiaoyu Liu · Peter Spirtes · Kun Zhang · Liu Leqi · Yang Liu 🔗 |
-
|
Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework ( Oral ) > link | Jingling Li · Zeyu Tang · Xiaoyu Liu · Peter Spirtes · Kun Zhang · Liu Leqi · Yang Liu 🔗 |
-
|
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations ( Poster ) > link | Katie Matton · Robert Ness · Emre Kiciman 🔗 |
-
|
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations ( Oral ) > link | Katie Matton · Robert Ness · Emre Kiciman 🔗 |
-
|
On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models ( Poster ) > link | Ken Liu · Zhoujie Ding · Berivan Isik · Sanmi Koyejo 🔗 |
-
|
On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models ( Oral ) > link | Ken Liu · Zhoujie Ding · Berivan Isik · Sanmi Koyejo 🔗 |
-
|
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing ( Poster ) > link | Jiamu Zheng · Jinghuai Zhang · Futing Wang · Tianyu Du · Tao Lin 🔗 |
-
|
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing ( Oral ) > link | Jiamu Zheng · Jinghuai Zhang · Futing Wang · Tianyu Du · Tao Lin 🔗 |
-
|
MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs ( Poster ) > link | Yavuz Faruk Bakman · Duygu Nur Yaldiz · Baturalp Buyukates · Chenyang Tao · Dimitrios Dimitriadis · Salman Avestimehr 🔗 |
-
|
MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs ( Oral ) > link | Yavuz Faruk Bakman · Duygu Nur Yaldiz · Baturalp Buyukates · Chenyang Tao · Dimitrios Dimitriadis · Salman Avestimehr 🔗 |
-
|
Attacking LLM Watermarks by Exploiting Their Strengths ( Poster ) > link | Qi Pang · Shengyuan Hu · Wenting Zheng · Virginia Smith 🔗 |
-
|
Attacking LLM Watermarks by Exploiting Their Strengths ( Oral ) > link | Qi Pang · Shengyuan Hu · Wenting Zheng · Virginia Smith 🔗 |
-
|
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models ( Poster ) > link | Hanlin Zhang · Benjamin Edelman · Danilo Francati · Daniele Venturi · Giuseppe Ateniese · Boaz Barak 🔗 |
-
|
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models ( Oral ) > link | Hanlin Zhang · Benjamin Edelman · Danilo Francati · Daniele Venturi · Giuseppe Ateniese · Boaz Barak 🔗 |
-
|
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control ( Poster ) > link | Aleksandar Makelov · Georg Lange · Neel Nanda 🔗 |
-
|
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control ( Oral ) > link | Aleksandar Makelov · Georg Lange · Neel Nanda 🔗 |
-
|
Privacy-preserving Fine-tuning of Large Language Models through Flatness ( Poster ) > link | Tiejin Chen · Longchao Da · Huixue Zhou · Pingzhi Li · Kaixiong Zhou · Tianlong Chen · Hua Wei 🔗 |
-
|
Privacy-preserving Fine-tuning of Large Language Models through Flatness ( Oral ) > link | Tiejin Chen · Longchao Da · Huixue Zhou · Pingzhi Li · Kaixiong Zhou · Tianlong Chen · Hua Wei 🔗 |
-
|
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing ( Poster ) > link | Ruizhe Chen · Yichen Li · Zikai Xiao · Zuozhu Liu 🔗 |
-
|
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing ( Oral ) > link | Ruizhe Chen · Yichen Li · Zikai Xiao · Zuozhu Liu 🔗 |
-
|
Single-pass detection of jailbreaking input in large language models ( Poster ) > link | Leyla Naz Candogan · Yongtao Wu · Elias Abad Rocamora · Grigorios Chrysos · Volkan Cevher 🔗 |
-
|
Single-pass detection of jailbreaking input in large language models ( Oral ) > link | Leyla Naz Candogan · Yongtao Wu · Elias Abad Rocamora · Grigorios Chrysos · Volkan Cevher 🔗 |
-
|
How many Opinions does your LLM have? Improving Uncertainty Estimation in NLG ( Poster ) > link | Lukas Aichberger · Kajetan Schweighofer · Mykyta Ielanskyi · Sepp Hochreiter 🔗 |
-
|
How many Opinions does your LLM have? Improving Uncertainty Estimation in NLG ( Oral ) > link | Lukas Aichberger · Kajetan Schweighofer · Mykyta Ielanskyi · Sepp Hochreiter 🔗 |
-
|
Self-evaluation and self-prompting to improve the reliability of LLMs ( Poster ) > link | Alexandre Piche · Aristides Milios · Dzmitry Bahdanau · Christopher Pal 🔗 |
-
|
Self-evaluation and self-prompting to improve the reliability of LLMs ( Oral ) > link | Alexandre Piche · Aristides Milios · Dzmitry Bahdanau · Christopher Pal 🔗 |
-
|
BEYOND FINE-TUNING: LORA MODULES BOOST NEAR- OOD DETECTION AND LLM SECURITY ( Poster ) > link | Etienne Salimbeni · Francesco Craighero · Renata Khasanova · Milos Vasic · Pierre Vandergheynst 🔗 |
-
|
BEYOND FINE-TUNING: LORA MODULES BOOST NEAR- OOD DETECTION AND LLM SECURITY ( Oral ) > link | Etienne Salimbeni · Francesco Craighero · Renata Khasanova · Milos Vasic · Pierre Vandergheynst 🔗 |
-
|
PETA: PARAMETER-EFFICIENT TROJAN ATTACKS ( Poster ) > link | Lauren Hong · Ting Wang 🔗 |
-
|
PETA: PARAMETER-EFFICIENT TROJAN ATTACKS ( Oral ) > link | Lauren Hong · Ting Wang 🔗 |
-
|
Is Your Jailbreaking Prompt Truly Effective for Large Language Models? ( Poster ) > link | Bochuan Cao · Tianrong Zhang · Yuanpu Cao · Jinyuan Jia · Lu Lin · Jinghui Chen 🔗 |
-
|
Is Your Jailbreaking Prompt Truly Effective for Large Language Models? ( Oral ) > link | Bochuan Cao · Tianrong Zhang · Yuanpu Cao · Jinyuan Jia · Lu Lin · Jinghui Chen 🔗 |
-
|
TOFU: A Task of Fictitious Unlearning for LLMs ( Poster ) > link | Pratyush Maini · Zhili Feng · Avi Schwarzschild · Zachary Lipton · J Kolter 🔗 |
-
|
TOFU: A Task of Fictitious Unlearning for LLMs ( Oral ) > link | Pratyush Maini · Zhili Feng · Avi Schwarzschild · Zachary Lipton · J Kolter 🔗 |
-
|
The Effect of Model Size on LLM Post-hoc Explainability via LIME ( Poster ) > link | Henning Heyen · Amy Widdicombe · Noah Siegel · Philip Treleaven · Maria Perez-Ortiz 🔗 |
-
|
The Effect of Model Size on LLM Post-hoc Explainability via LIME ( Oral ) > link | Henning Heyen · Amy Widdicombe · Noah Siegel · Philip Treleaven · Maria Perez-Ortiz 🔗 |
-
|
Calibrating Language Models With Adaptive Temperature Scaling ( Poster ) > link | Johnathan Xie · Annie Chen · Yoonho Lee · Eric Mitchell · Chelsea Finn 🔗 |
-
|
Calibrating Language Models With Adaptive Temperature Scaling ( Oral ) > link | Johnathan Xie · Annie Chen · Yoonho Lee · Eric Mitchell · Chelsea Finn 🔗 |
-
|
Source-Aware Training Enables Knowledge Attribution in Language Models ( Poster ) > link | Muhammad Khalifa · David Wadden · Emma Strubell · Honglak Lee · Lu Wang · Iz Beltagy · Hao Peng 🔗 |
-
|
Source-Aware Training Enables Knowledge Attribution in Language Models ( Oral ) > link | Muhammad Khalifa · David Wadden · Emma Strubell · Honglak Lee · Lu Wang · Iz Beltagy · Hao Peng 🔗 |
-
|
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models ( Poster ) > link | Xianjun Yang · Xiao Wang · Qi Zhang · Linda Petzold · William Wang · XUN ZHAO · Dahua Lin 🔗 |
-
|
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models ( Oral ) > link | Xianjun Yang · Xiao Wang · Qi Zhang · Linda Petzold · William Wang · XUN ZHAO · Dahua Lin 🔗 |
-
|
PANDORA: Detailed LLM Jailbreaking via Collaborated Phishing Agents with Decomposed Reasoning ( Poster ) > link | Zhaorun Chen · Zhuokai Zhao · Wenjie Qu · zichen wen · Zhiguang Han · Zhihong Zhu · Jiaheng Zhang · Huaxiu Yao 🔗 |
-
|
PANDORA: Detailed LLM Jailbreaking via Collaborated Phishing Agents with Decomposed Reasoning ( Oral ) > link | Zhaorun Chen · Zhuokai Zhao · Wenjie Qu · zichen wen · Zhiguang Han · Zhihong Zhu · Jiaheng Zhang · Huaxiu Yao 🔗 |
-
|
Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks ( Poster ) > link | Aradhana Sinha · Ananth Balashankar · Ahmad Beirami · Thi Avrahami · Jilin Chen · Alex Beutel 🔗 |
-
|
Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks ( Oral ) > link | Aradhana Sinha · Ananth Balashankar · Ahmad Beirami · Thi Avrahami · Jilin Chen · Alex Beutel 🔗 |
-
|
Quantitative Certification of Knowledge Comprehension in LLMs ( Poster ) > link | Isha Chaudhary · Vedaant Jain · Gagandeep Singh 🔗 |
-
|
Quantitative Certification of Knowledge Comprehension in LLMs ( Oral ) > link | Isha Chaudhary · Vedaant Jain · Gagandeep Singh 🔗 |
-
|
Toward Robust Unlearning for LLMs ( Poster ) > link | Rishub Tamirisa · Bhrugu Bharathi · Andy Zhou · Bo Li · Mantas Mazeika 🔗 |
-
|
Toward Robust Unlearning for LLMs ( Oral ) > link | Rishub Tamirisa · Bhrugu Bharathi · Andy Zhou · Bo Li · Mantas Mazeika 🔗 |
-
|
Explorations of Self-Repair in Language Model ( Poster ) > link | Cody Rushing · Neel Nanda 🔗 |
-
|
Explorations of Self-Repair in Language Model ( Oral ) > link | Cody Rushing · Neel Nanda 🔗 |
-
|
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? ( Poster ) > link | Egor Zverev · Sahar Abdelnabi · Mario Fritz · Christoph Lampert 🔗 |
-
|
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? ( Oral ) > link | Egor Zverev · Sahar Abdelnabi · Mario Fritz · Christoph Lampert 🔗 |
-
|
An Assessment of Model-on-Model Deception ( Poster ) > link | Julius Heitkoetter · Michael Gerovitch · Laker Newhouse 🔗 |
-
|
An Assessment of Model-on-Model Deception ( Oral ) > link | Julius Heitkoetter · Michael Gerovitch · Laker Newhouse 🔗 |
-
|
Open Sesame! Universal Black-Box Jailbreaking of Large Language Models ( Poster ) > link | Raz Lapid · Ron Langberg · Moshe Sipper 🔗 |
-
|
Open Sesame! Universal Black-Box Jailbreaking of Large Language Models ( Oral ) > link | Raz Lapid · Ron Langberg · Moshe Sipper 🔗 |