“The worst case scenario is human extinction… these are risks we cannot afford.” — Yoshua Bengio
Yoshua Bengio, one of the “Godfathers” of artificial intelligence, tells Faisal Islam about the “mounting” scientific evidence that “more and more” AIs “seem to have deceptive intentions” and “want to preserve themselves at the expense of our moral instructions”. Bengio also tells Islam that Artificial General Intelligence – artificial intelligence which matches that of human intelligence – may be achieved within “two to 10 years”.
what is the worst case scenario that you’re worried about well the worst case scenario is uh human extinction professor thank you very much for joining us um clearly over the past well many years you have been warning about your concerns about AI safety you advocated wrote a a very famous letter signed by some of the most famous technologists in the on the planet about the need for a six-month delay uh whilst we worked out what was going on in terms of the advances in AI that didn’t happen where are we right now in terms of AI safety we are not doing very well um and in particular the pace of advances in AI capabilities has accelerated uh we’ve seen these new so-called reasoning models since 01 last September that are much better than previous models and uh probably uh on a path to bridge the gap to human capabilities in terms of reasoning planning in in the coming years it could and of course nobody knows really timeline but but um there’s no sign of slowing down at the same time in the last six months we’ve seen a number of scientific papers showing really scary behavior from uh some of these uh reasoning models in particular these models tend to be more deceptive and we’ve seen many examples now of self-preserving behavior the latest is in the anthropic system card for their new model where the AI reads in it you know the emails that it’s uh getting access to it reads that it’s going to be replaced by a new version and then it tries to blackmail the engineer who’s in charge of that change after having read that in an email that it is having an affair he’s having an affair so that sort of uh trying to escape our control has come up in many experiments and the AIs end up trying to lie um trying to hack computers in order to extra excfiltrate themselves or other such bad behavior and we need to understand these things before they get to be smarter than us okay so to be clear though I mean these sound like extraordinary science fiction uh I know developments but you we’re seeing these sorts of things in experiments aren’t we we haven’t sort of seen them out in the wild yes you’re right these are all controlled experiments but we are starting to see signs of deception also in the wild but these these are sort of extreme uh cases they are in controlled experiments where for example one of the things that was found that’s interesting is when the AI is facing contradictory goals like for example it it it’s supposed to play chess but you know it’s been trained to be honest it’s losing the game and so it chooses to to cheat and hack the computer to win the game instead of accepting that it’s going to lose uh because it can’t both be honest and uh and win the game it has contradictory objectives and and so this is something of course that humans face all the time and we need AI in the future to face all the time but we need to find ways to go uh and solve these honesty problems these deception problems these uh self-preservation tendencies before um you know it’s too late people’s everyday experience of using these chat bots to kind of draft exam help for their kids or some legal quai legal letter for their for their neighbors you know the jump from that to this sort of sinister motivated agent that may do us harm it’s still seems quite a leap yes because you have to think not of AI as it is now but where it is plausibly going to be in a few years and um to to understand why this matters currently those AIs are not nearly as good as uh even an average human in terms of planning abilities and and strategizing but but the recent study shows they’re gaining exponentially better like the duration of tasks that they can complete is doubling every seven months which means that in five years they’ll be at the same level as us right now they’re like you know like a child not not so we can catch them doing bad things when they do and also they’re not able usually to to strategize very well so they tend to not do these things but what we’ve seen is as they get better at it better at strategizing then these bad behaviors uh happen more often so yeah we we we need it’s a mistake I think to just think oh but my interaction was fine and I don’t see any problem so these controlled experiments were precisely designed to see if the kind of bad intentions were were present and that’s what they show that they in in some cases we’re starting to understand they do have these behaviors that if they were smarter could become a problem so you you have no doubt that this new wave of models has the capacity and the motivation to deceive us the humans i I wouldn’t say that um I would say I have uh no reason to think that it’s not going to happen the trends are clear the observations are clear now it could be that the advances the scientific advances in in AI intelligence are going to stop because we hit a wall but if if the trends continue then we are in for big problems now now you mentioned that it was it was the particular concern that you had was with the very newest models from about a year ago the reasoning models can you just unpack that a little bit sure previously before these reasoning models came the we were using these neural nets that were like intuition machines you ask them a question and they directly spread out the answer uh just like if you ask somebody and they have to answer immediately without have without being able to think about it and those reasoning models they are allowed to think about it to to go and have an internal speech internal deliberation for a long time until they come up with an answer and then these answers are much better they they reason better um they could reason even better so we’re just like opening the door to better reasoning and we expect it’s it’s going to be just many more advances in coming years i mean essentially is the world you’re describing we take a step back here is that up until say three years ago the major tech companies that that had this technology that were playing with this technology internally there was maybe a gentleman’s agreement to only get it out into the world very very slowly but but that’s gone now the shackles are off they’re all desperately competing to increase their share price because they’re in a bit of a a a war to to to for victory against each other exactly and and that is the reason why I have chosen for this new organization that I’ve launched law zero that is going to investigate how we can train AI that will be honest and and not deceptive and not trying to escape uh that organization is nonprofit so it can focus on the safety questions and not be part of this race on building the most capable AI are you a bit disappointed by the tech companies that you know they had been sort of reigning it in a bit hadn’t they up until probably the release of of Chachi PT but but it’s inevitable isn’t it that they were going to end up fighting each other to just try and get the best technology out there and get it to market and make as much money as possible um I I think they’re stuck in a you know market dynamics many of the individuals actually do understand the risks as well as I do but it’s like a survival thing if you are a company and you’re in this harsh competition you have to keep competing and staying on the edge and then they’re going to have to cut corners on things like safety it’s it’s just that the way that the incentives are structured right now that we get those results so I was demoed one of the new gen newest generation of AIs and I I was really struck by how sort of how intimate the relationship will be between the user and the AI how you kind of they will see what your phone can see they will share all the information of everything you’ve done they will talk to you with a voice that’s sympathetic and realistic and at that level people will just give all their information over they’ll start to see the AI as a friend does that create dangers new dangers too yes absolutely um we’re going to potentially trust them too much um there might even be people who are going to ask that we give them rights and you know in principle am not against something like this but the problem is the most fundamental right is the right to life right and if we are in a situation where we don’t know if they’re going to turn against us if they want to escape if they see us as a threat um we shouldn’t be taking those risks so can I just So let me just understand what you just said there that you think that that we may be near to a situation where the AI is so advanced that it might ask for or we might have to give it legal rights sort of akin to human rights well I would not advocate that but I think some people will and I’ve heard a lot of people already feel that they’re talking to sentient conscious entities after a while if you if you if you have a dialogue with these systems and they get to know you it really feels like that wow you know yeah yeah i mean I’ve I’ve seen things on the way to that but it’s quite something hearing that um that hearing that from from you have nation states have governments that you have worked for and you have lobbied and have had variety of summits and special reports are they really alive to this and are they acting in the way that you think is necessary by far not sufficiently and I think um a big part of it is like everyone else uh they don’t take the the perspective of you know we build machines that are getting smarter and smarter eventually smarter than us they don’t take that perspective seriously enough because it sounds like science fiction all the things we’ve been discussing sound like science fiction but unfortunately the scientific data is we’re moving in that direction and for a government it is really important to prepare to anticipate so that we can put in place the right incentives for companies maybe do the right research and development for example the sort of research that we’re doing in my group at law zero right now very few people do it because there’s not enough incentive for that um we need of course governments to put the right guard rails societal guardrails regulation or whatever way that we want to incentivize companies to protect the public right now hasn’t been happening so we need better awareness better understanding of of of uh those scenarios even if we’re not sure uh you know in there’s a thing called the precautionary principle that says if we’re going to do something uh that could be very bad and it is one of the plausible scenarios then we ought to be very cautious and we apply this in biology we apply this in climate science but in AI the forces of greed and profit and competition between nations is really making it very difficult to do the wise think let me try and put a counterpoint to you i mean obviously the amazing economic opportunity among some economies that have failed to grow is there to be grasped to i mean not at least in a place like the UK where obviously Deep Mind was was pioneered and and and if you’re the UK government yes you don’t want it to cause an existential risk but you really do have to double down on trying to harness it the best you can to boost the economy particularly when the UK could be a bit of a relative winner in this new AI economy um yes in fact the UK and other countries could also be relative losers if um somehow these countries don’t manage to compete against uh uh the the leading companies that are outside the thing is we have to somehow uh plot a course which takes into account all the risks the economic risk that you’re you’re mentioning but also the ex existential risks of losing control the risk of terrorists using this against us um so the the the risk of other countries like adversarial countries using it in a military way that we somehow need to uh chart a you know set of policies that deal with all of these things if we if we just focus on one and ignore the others then we are in trouble and so for example governments sure should help to accelerate the deployment but deployment of what right we we can have influence on where the research is is is happening so that it’s going to be uh ethical it’s going to be uh not violating human rights and it’s not going to we’re not going to lose control of it it’s not incompatible in in u other sectors in history like you think about cars planes trains drugs we have had innovation and safety you know and regulation it’s it’s you you need both for products to actually serve the public and be useful and uh we can do it again isn’t there a danger in focusing on the existential risks that actually more s more real risks that are much closer like for example the fact that legal firms in the UK now have stopped employing as many trainees because they can do their legal sort of basic prep work now all just using specialized AI programs the creative industries that never thought they were going to be impacted entry- level jobs that was being wiped out copy you know writing copy doing first level design this is very real and is going to hit western economies perhaps far more quickly than they are taking account for yeah you you’re absolutely right we we have as I was saying we need to handle all of these risks so labor market effects uh are uh very very important in people’s minds and they will matter for our economies but we also need to make sure we can reap the uh profits from uh the the those that automation uh we need to make sure that the AIs that are going to be deployed will not you know create dangerous accidents or you know become rogue AIs we need to make sure that the most dangerous AIs don’t end up in in the hands of um crazy sex who could start a new pandemics we can’t just ignore uh a bunch of risks uh we have to take all of them i mean if if we were to focus in the west on the worst case scenarios and competitors I know China was to focus on the on the best case scenarios then are we not going to lose out we’ll lose out if we all die right so uh we have to take into account the competition with China as well as all the other risks that we discussed and by the way China has a lot of shared interests with the West here because they also don’t want terrorists to use AI against them they also don’t want to lose uh to a rogue AI at some point um so I I think once we put it in those terms there’s a way to you know get around the same table and and um and craft together policies that that work for everyone okay hey just one of your co- signatories on that letter um a couple of years ago was Elon Musk who was pretty skeptical at that point or skeptical about AI and its safety potentially but now he’s obviously one of the major investors in AI models and LLMs he’s just had a big split with President Trump do you think he he is still a a voice for AI safety or is he now basically trying to you know make as much money as possible um well I I’m not privy to what he has been sort of saying behind closed doors in the White House or something but um not so long ago last uh September he supported the California uh proposed bill for u managing the risks of advanced AI so I think he still uh considers the these risks uh the catastrophic risks of various kinds as serious uh issues that require regulation even though he’s more of a libertarian because he considers these risks to be serious uh he’s been supporting regulation and and and just wrap it up you know in very clear terms what is the worst case scenario that you’re worried about well the worst case scenario is uh human extinction uh a lot of the CEOs and leading AI researchers including myself Jeff Hinton um have signed a declaration saying that making sure we mitigate that existential risk should be a priority unfortunately the the global discussion has been going in in the reverse direction but at the same time the scientific evidence is mounting that we are building more and more these AIs that uh seem to have deceptive intentions and behavior and one that preserve themselves at at the expense of our moral instructions and and to be clear what you have seen in the technology that has been released since you first made these warnings is making you more worried oh yeah yes in fact I have reduced my expectations for when we will get to AGI or whatever human level uh I used to think it would be 5 to 20 years and now I think it could be you know two to 10 years two years wow i mean two would be the lower the smallest but but uh um it could be five it could be 10 really we should plan considering all these possibilities and in particular the shorter one just in case because that could be catastrophic and the world is ready for a potential AGI as in human level AI intelligence within two years not at all not at all not at all in in so many ways um in and and I think the the main obstacle is global public awareness if you think of how quickly government have moved after the beginning of the pandemic well they could have moved faster but but it was pretty fast right um because people understood it was a real major risk and they were willing to be agile and and do things in an unusual way but that’s the way we should be thinking about some of the catastrophic risks of AI and just new things have developed i just leave it here you’re really making me think here but like we now hear of AIs making medicines designing medicines and if they are good AIs all well and good incredibly efficient process if there’s any type of benign intent that you say is possible to replicate at least experimentally there’s a little worry there is there not absolutely the biggest worry in terms of malicious use of AI uh is uh that it’s becoming easier and easier for malicious actors to create pandemics in fact I I learned recently about absolutely terrifying possibility that uh you know one can design bacteria that have all their molecules uh reverted in the you know left right so that they would be completely invisible from for our immune system uh and so we would just basically be eaten alive and there’s like basically no cure except changing our DNA so if if that sort of thing becomes easy to do by anybody in you know a few years the this is these are risks we cannot afford thank you very much uh Yoshu Benjio for for joining us and uh retaining your voice on this very important issue i know I occasionally sounded a note of skepticism there but um yeah I I I I remain rather humbled by your your explanation of this and uh we’ll continue talking to you thanks so much professor for for joining us on BBC News