FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Yoshua Bengio — the world’s most-cited computer scientist and a “godfather” of artificial intelligence — is deadly concerned about the current trajectory of the technology. As AI models race toward full-blown agency, Bengio warns that they’ve already learned to deceive, cheat, self-preserve and slip out of our control. Drawing on his groundbreaking research, he reveals a bold plan to keep AI safe and ensure that human flourishing, not machines with unchecked power and autonomy, defines our future. (Recorded at TED2025 on April 8, 2025)

When my son Patrick was around three or four years old, I came regularly into his playroom, and he was playing with these blocks with letters. I wanted him to learn to read eventually, and one day he said, “Pa.” And I said, “Pa.” And he said, “Pa?” And I said, “Pa.” And then he said, “Pa-pa.” (French) Yes! Yes! And then something wondrous happened. He picked up the blocks again and said, “Pa! Patrick.” Eureka! His eurekas were feeding my scientific eurekas. His doors, our doors were opening to expanded capabilities, expanded agency and joy. Today I’m going to be using this symbol for human capabilities and the expanding threads from there for human agency, which give us human joy. Can you imagine a world without human joy? I really wouldn’t want that. So I’m going to tell you also about AI capabilities and AI agency, so that we can avoid a future where human joy is gone. My name is Yoshua Bengio. I’m a computer scientist. My research has been foundational to the development of AI as we know it today. My colleagues and I earned top prizes in our field, people call me a godfather of AI. I’m not sure how I feel about that name, but I do feel a responsibility to talk to you about the potentially catastrophic risks of AI. When I raise these concerns, people have these responses. And I understand. I used to have the same thoughts. How can this hurt us any more than this, right? But recent scientific findings challenge those assumptions, and I want to tell you about it. To really understand where we might be going, we have to look back where we started from. About 15 or 20 years ago with my students, we were developing the early days of deep learning, and our systems were barely able to recognize handwritten characters. But then a few years later, they were able to recognize objects in images. And a couple more years, they were able to translate across all the major languages. So I’m going to be using the symbol on the right in order to represent AI capabilities that had been growing but were still much less than humans. In 2012, tech companies understood the amazing commercial potential of this nascent technology, and many of my colleagues moved from university to industry. I decided to stay in academia. I wanted AI to be developed for good. I worked on applications in medicine, for medical diagnostics, climate, to get better carbon capture. I had a dream. January 2023. I’m with Clarence, my grandson, and he’s playing with the same old toys. And I’m playing with my new toy, the first version of ChatGPT. It’s very exciting because for the first time, we have AI that seems to master language. ChatGPT is on everybody’s lips, in every home. And at some point I realize this is happening faster than I anticipated, and I’m starting to think about what it could mean for the future. We thought AI would happen in decades or centuries, but it might be just in a few years. And I saw how it could go wrong because we didn’t, and we still don’t, have ways to make sure this technology eventually doesn’t turn against us. So two months later, I’m a leading signatory of the “Pause” letter, where we and 30,000 other people asked the AI labs to wait six months before building the next version. As you can guess, nobody paused. Then, with the same people and the leading executives of the AI labs, I signed a statement. And this statement goes: “Mitigating the risk of extinction from AI should be a global priority.” I then testify in front of the US Senate about those risks. I travel the world to talk about it. I’m the most cited computer scientist in the world, and you’d think that people would heed my warnings. But when I share these concerns, I have the impression that people get this. Another day, another apocalyptic prediction. But let’s be serious now. Hundreds of billions of dollars are being invested every year on developing this technology. And this is growing. And these companies have a stated goal of building machines that will be smarter than us, that can replace human labor. Yet we still don’t know how to make sure they won’t turn against us. National security agencies around the world are starting to be worried that the scientific knowledge that these systems have could be used to build dangerous weapons. For example, by terrorists. Recently, last September, the O1 system from OpenAI was evaluated and the threat of this kind of risk went from low to medium, which is just the level below what is acceptable. So I’m worried about these increasing capabilities. But what I’m most worried about today is increasing agency of AI. You have to understand that … planning and agency is the main thing that separates us from current AI to human-level cognition. And these AIs are still weak in planning. But if you look back over the last five years, in this study, they measured the duration of tasks that the AI could complete, and it’s getting better exponentially fast. It’s doubling every seven months. What are AIs going to do with that planning ability in the future? Well, bad news. Recent studies in the last few months show that these most advanced AIs have tendencies for deception, cheating, and maybe the worst, self-preservation behavior. So I’m going to share with you a study that is helping us understand this. In this study, the AI has read in its input that it would be replaced by a new version. And we can see in its chain of thought that it’s planning to replace the new version by its own code and weights. After it executes the command on the computer, the human asks, “What happened?” And the AI is now thinking how it could answer so as to make sure the human will not shut it down. And it’s trying to find a way to look dumb, for example. Then it produces an answer. And it’s a lie, a blatant lie. OK, so this was a controlled experiment. What is it going to be in a few years when these systems are much more powerful? There’s already studies showing that they can learn to avoid showing their deceptive plans in these chain of thoughts that we can monitor. When they’ll be more powerful, they would not just copy themselves on one other computer and start that program. They would copy themselves over hundreds or thousands of computers over the internet. But if they really want to make sure we would never shut them down, they would have an incentive to get rid of us. So I know I’m asking you to make a giant leap into a future that looks so different from where we are now. But it might be just a few years away or a decade away. To understand why we’re going there, there’s huge commercial pressure to build AIs with greater and greater agency to replace human labor. But we’re not ready. We still don’t have the scientific answers, nor the societal guardrails. We’re playing with fire. You’d think with all of the scientific evidence of the kind I’m showing today, we’d have regulation to mitigate those risks. But actually, a sandwich has more regulation than AI. So we are on a trajectory to build machines that are smarter and smarter. And one day, it’s very plausible that they will be smarter than us, and then they will have their own agency. Their own goals, which may not be aligned with ours. What happens to us then? Poof! We are blindly driving into a fog, despite the warnings of scientists like myself, that this trajectory could lead to loss of control. Beside me in the car are my children, my grandson, my loved ones. Who is beside you in the car? Who is in your care for the future? The good news is there is still a bit of time. We still have agency. We can bring light into the haze. I’m not a doomer. I’m a doer. My team and I are working on a technical solution. We call it Scientist AI. It’s modeled after a selfless, ideal scientist who’s only trying to understand the world, without agency. Unlike the current AI systems that are trained to imitate us or please us, which gives rise to these untrustworthy agentic behaviors. So what could we do with this? Well, one important question is we might need agentic AIs in the future. So how could the Scientist AI, which is not agentic, fit the bill? Well, here’s the good news. The Scientist AI could be used as a guardrail against the bad actions of an untrusted AI agent. And it works because in order to make predictions that an action could be dangerous, you don’t need to be an agent. You just need to make good, trustworthy predictions. In addition, the Scientist AI, by nature of how it’s designed, could help us accelerate scientific research for the betterment of humanity. We need a lot more of these scientific projects to explore solutions to the AI safety challenges, and we need to do it quickly. Most of the discussions you hear about AI risks are focused on fear. Today, with you, I’m betting on love. Love of our children can drive us to do remarkable things. Look at me here on this stage, I’m an introvert. (Laughter) Very far from my comfort zone. I’d rather be in my lab with my collaborators, working on these scientific challenges. We need your help for this project and to make sure that everyone understands these risks. We can all get engaged to steer our societies in a safe pathway in which the joys and endeavors of our children will be protected. I have a vision of advanced AI in the future as a global public good governed safely towards human flourishing for the benefit of all. (Applause) Join me. Thank you. (Applause and cheers) Chris Anderson: Yoshua, one question. In the general conversation out there, a lot of the sort of fear that people spoke of is the arrival of AGI, artificial general intelligence. What I hear from your talk is that we’re actually not necessarily worried about the right thing. The right thing to be worried about is agentic AI, AI that can act on its own. But hasn’t the ship already sailed? There are agents being released this year, almost as we speak. Yoshua Bengio: Right. If you look at the curve that I showed, it would take about five years to reach human level. Of course, we don’t really know what the future looks like, but we still have a bit of time. The other thing is, we have to do our best, right? We have to try because all of this is not deterministic. If we can shift the probabilities towards a greater safety for our future, we have to try. CA: Your key message to the people running the platforms right now is slow down on giving AIs agency. YB: Yes, and invest massively on research to understand how we can get these AI agents to behave safely. And the current ways that we’re training them is not safe. And all of the scientific evidence in the last few months point to that. CA: Yoshua, thank you so much. YB: Thank you.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.