Building Safe and Robust AI Systems
Zico Kolter
2025 Invited Talk
Abstract
As AI systems become more powerful, it is increasingly important that developers be able to strictly enforce desired policies for the systems. Unfortunately, via techniques such as adversarial attacks, it has traditionally been possible to circumvent model policies, allowing bad actors to manipulate LLMs for unintended and potentially harmful purposes. In this talk, I will highlight several recent directions of work that are making progress in addressing these challenges, including methods for robustness to jailbreaks, safety pre-training, and methods for preventing undesirable model distillation. I will additionally highlight some of the areas I believe to be most crucial for future work in the field.
Speaker
Zico Kolter
Zico Kolter is a Professor and Department Head of the Machine Learning Department at Carnegie Mellon University. In addition, he serves as the Chief Technical Advisor to Gray Swan AI, an AI Security company, and serves on the board of OpenAI where he chairs the safety and security committee. His work spans several topics in machine learning, including work in AI safety and robustness, LLM security, the impact of data on models, implicit models, and more.
Video
Chat is not available.
Successful Page Load