Skip to yearly menu bar Skip to main content


Invited Talk

Building Safe and Robust AI Systems

Zico Kolter

Hall 1 Apex
[ ]
Wed 23 Apr 6 p.m. PDT — 7 p.m. PDT

Abstract:

As AI systems become more powerful, it is increasingly important that developers be able to strictly enforce desired policies for the systems. Unfortunately, via techniques such as adversarial attacks, it has traditionally been possible to circumvent model policies, allowing bad actors to manipulate LLMs for unintended and potentially harmful purposes. In this talk, I will highlight several recent directions of work that are making progress in addressing these challenges, including methods for robustness to jailbreaks, safety pre-training, and methods for preventing undesirable model distillation. I will additionally highlight some of the areas I believe to be most crucial for future work in the field.

Live content is unavailable. Log in and register to view live content