Dr. Yoshua Bengio: “The worst of course is what people call loss of control… I’m worried [about risks] rather than excited [about benefits]… I used to be excited .
GREENE: What’s your worst fear in the unintended pernicious consequence of AI? (36:39)
Bengio: “Well I’m worried about all the things that can happen but the worst of course is what people call loss of control.
So let me maybe use an analogy to explain what the loss of control is about there are many ways you could lose control but the one that scares me the most is the following it’s when the AI because it’s been programmed to maximize the rewards we give it the rewards we give it when it behaves well this is how we train these systems right now we we train them like your cat or dog by giving them positive or negative rewards depending on their behavior but there’s a problem with that um first they might have a a different interpretation of what is right and wrong so think about your cat and you’re trying to train it to not go on the kitchen table and it gets you know you shout at it when you’re in the kitchen and you see it on the table but what it may understand is I shouldn’t go on the table when the master isn’t in the kitchen that’s my is a very different proposition yes so that kind of mismatch it’s called alignment is already kind of scary yeah if if it was not a cat but it was something more powerful but it gets worse than that imagine it’s something it is something more powerful like uh it’s not a cat it’s a grizzly bear and okay we know grizzly bear could overpower us we’re building we’re going to be building these agis are going to be smart than us so we’re going to try to have some defenses so we put the Bear in a cage but right now we have no visibility on how we could build that cage that is guaranteed to hold the bear inside forever and in fact everything we’ve tried has been defeated so people do these uh jailbreak prompts for example that break all the defenses that the companies that working on AI have been able to figure out maybe well one day we’ll figure out how to build a really safe cage but right now we don’t know so what does that mean it means that when the bear gets smart enough or strong enough it breaks the door it breaks the lock it hacks it maybe you know using a Cyber attack and it gets out and you you know when it was in the cage you were training it by giving it fish when it behaved well same thing for the AI right you give it positive feedback um but now it can just grab the fish from your hands it doesn’t once it grabs the reward and it controls the mechanism by which it gets reward it doesn’t care about what we want right it it cares about making sure it keeps control on the fish which maybe we don’t want so there’s a conflict and he wants to make sure we never take him back in the cage so it needs to control us or get rid of us.
What I’m doing is I’m going full pin on trying to solve this control problem. Like how do we build a safe cage. And I think we should invest a lot more in that or be ready to stop and you know slow down our industry which of course you know we have lots of good reasons not to do…
I’ve been talking to a number of governments and the folks in government who have been working on National Security they get it because they’re used to think about the odd chance of something really bad can happen and trying to put protections yeah to minimize those risks um and yeah the response has been very different from different governments and I think the level of understanding of the threat is still something you know that’s lacking in most governments um of course. The US government and the British government have been very proactive in those directions.
GREEN. So as a Titan in the field who’s working on that cage, where where would you say we are in that effort? Are you feeling confident that this is something doable?
BENGIO: I think to some extent yes so I’m among a small group of researchers who think that we have a chance of coming up with provable guarantees of safety or in in at least asymptotically provable guarantees of safety um which was be which would be already a lot better than no guarantees at all yeah which is the current situation um and unfortunately I have the impression that mostly industry where trying to make small steps to try to increase safety but not really addressing the bigger problem of how do we make the cage really safe and so the things that are going on right now are good but insufficient by far if we were to reach AGI too soon.