FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

KHAN. “So control super important for the next two years and we’re optimistic we can do it. Sound about right?” (51:07)

BENTON. “Oh. Hope so.”

15,311 views 17 Mar 2025

Anthropic researchers Ethan Perez, Joe Benton, and Akbir Khan discuss AI control—an approach to managing the risks of advanced AI systems. They discuss real-world evaluations showing how humans struggle to detect deceptive AI, the three major threat models researchers are working to mitigate, and the overall idea of controlling highly-capable AI systems whose goals may differ from our own. 0:00 Introduction 0:33 What is AI control? 2:56 Control evaluations in practice 5:39 Results from evaluations 7:27 Monitoring protocols 13:18 How control differs from alignment 16:09 The challenge of alignment faking 23:10 Ensuring evaluations work for future models 26:09 Open questions in control research 34:15 Lessons learned from control 37:14 Why work on control now? 43:26 Key threat models 48:35 Optimistic signs

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.