“If you create something that’s more powerful than human beings, how on Earth are we going to have power over such systems forever?… Right now we’re pursuing completely unsafe blackbox AI that we don’t understand at all and we are trying to make it into something that’s more powerful than us… [why?].. about a tenfold increase in the GDP of the world and the net present value of that is about 15 quadrillion dollars, so that’s the cash value, minimum cash value, of AGI as a technology. So then you can see why we’re investing relatively large amounts of money because in comparison those amounts of money are minuscule.” — Prof Stuart Russell
Stuart Russell says success at building AGI would be the biggest event in human history because it would mean supporting everybody on Earth in “an objectively splendid style of life” since the minimum cash value of AGI would be $15 quadrillion pic.twitter.com/RAixa6huim
— Tsarathustra (@tsarnick) January 4, 2025
Stuart Russell says creating AI that satisfies all our preferences will lead to a lack of autonomy for humans and thus there may be no satisfactory form of coexistence between humanity and superior machine entities, so the AI systems may leave us pic.twitter.com/XG57hgQYRH
— Tsarathustra (@tsarnick) January 4, 2025
TRANSCRIPT (auto-generated). Thank you um I’m standing up so you can tell that I’m talking and even though the topic here is um AI ethics I’m actually going to talk more about common sense because I think uh many of the big questions that we Face are not really ethical questions so what do we mean by that so if you go back to the beginning of AI uh in the 1940s official birthday is 1956 but it was already going before that uh and the goal of AI has always been create machines that exceed human intelligence along every relevant Dimension and uh nowadays we call that AGI or artificial general intelligence uh what we failed to do for most of the history of the field is to ask a very important question what if we succeed in that goal if we succeed in this goal it would be the biggest event in human history and later on I’ll explain a bit more why it would be the biggest event in human history but you know it’s kind of obvious right we humans dominate the world because of our intelligence civilization is the result of our intelligence what could possibly go wrong if we introduce a new class of entities a new species if you like uh that is more intelligent than us inevitably it would be a turning point in Civilization uh Demis cabis uh who CEO of Google deep mind uh puts it this way he says first we solve Ai and then we use AI to solve everything else so a question that we until very recently never asked was have we succeeded no one asked that question but just a year ago the co-author on my textbook Bea norvig published an article claiming that in fact we have succeeded in creating AGI it’s just that um you know kind of like the wri brothers airplanes in 1903 you know they weren’t very comfortable they didn’t have a full bar like you have now with champagne and after dinner drinks and so on uh but they were airplanes and all that’s happened since 1903 is they got bigger and more comfortable and faster but the principle was achieved so is that the case for artificial intelligence do we have the right Brothers version of AGI I am pretty convinced that the answer is no I could be wrong because the AI we have now we haven’t the faintest idea how it works the Wright brothers did have a pretty good idea of how their airplane worked because they put it together themselves right they they had an engine they figured out how big do the engine need to be in order to generate enough thrust in order to go fast enough in order to have enough lift to stay off the ground so they did all the basic calculations of thrust and drag and lift and power and and so on so they had a pretty good idea before they even flew it that it was going to fly but with the AI systems that we have now which are giant black boxes basically approximately a trillion tunable elements in a giant circuit um and we do about a trillion trillion trillion small random mutations to those elements until the thing behaves uh approximately intelligently so it’s more as if the Wright brothers instead of Designing and building an airplane had actually decided to go into the bird breeding business and breed larger and larger and larger Birds until they bred a bird that was big enough to carry passengers and uh and then they go to the FAA and say you know can you certify our giant bird and the FAA says well your bird still eating people and still dropping them in the ocean uh and we don’t know how it works and we don’t know what it’s going to do so we’re not going to certify it so that’s sort of where we are right now and in my view the giant birds will probably never get big enough to carry hundreds or thousands of passengers and we will never understand how they work uh and they will probably never go faster than the speed of sound or any of those things we need further breakthroughs both in terms of capabilities and in terms of understanding because capabilities without understanding are really of no use to us The era of deep learning so let me talk a little bit about what’s happened in the last 10 years the period the the the era of deep learning so deep learning as I said is you you start with a big giant bag of tunable elements and you tune them uh so that the endtoend behavior resembles what you’re trying to create whether it’s something that recognizes objects in images or translates Chinese into English or any number of other tasks um and in fact I would say machine translation was the first really high impact application of this technology and uh you know I was thrilled when when uh I could translate the French tax documents that I had to deal with because I you know I have a an apartment in France and uh so I could translate all those French text documents into English and it did a perfect job of translating them into English I still couldn’t understand them but the translation part was perfect another big success uh is Alpha fold I’m sure many of you heard of that that uh Alpha fold is a system for predicting the structures of proteins from the amino acid sequence uh that was an open problem uh in structural biology for decades uh the methods we had previously were incredibly slow and expensive they only worked for a small class of proteins and uh this computational method of deciding how a protein folds up into a structure has basically let uh biologists into a giant candy store uh where they now have uh millions of protein structures that uh they can predict instead of just hundreds so that was a huge contribution to science another big contribution to science and engineering is the use of machine learning for simulation simulation underlies an enormous part of our world today we simulate Bridges we simulate airplanes we simulate fluid flow uh around ships and through pipelines and through arteries uh all of this is incredibly computationally expensive it can take weeks to do a simulation of blood flow on a supercomputer uh but with machine learning methods we’re able to reduce those weeks to seconds uh and get equally accurate results and this is an enabling us to do better weather forecasting better climate modeling better engineering design all kinds of things uh can be massively sped up another interesting example is generative design so you’re probably familiar with uh do e and mid Journey uh and stable diffusion these systems that you say you know give me a picture of members of the House of Lords mud wrestling uh which is something I actually asked it to do when I gave a speech at the House of Lords and and uh it actually did a pretty good job uh except the four members of the House of Lords who are mud wrestling only had five legs between them so a practical use of that is to is what we call generative design where uh instead of Designing structures by uh a human being uh using a cad tool and uh making solid uh shapes and and pinning those solid shapes together to produce a structure and then analyzing that structure and then seeing that either it’s too weak or it’s too heavy or it’s not strong enough uh instead we can ask the AI system to come up with structures that meet our design requirements and these generative design methods come up with beautiful elegant almost biological organic designs uh that are often much better than the ones that human beings have been able to come up with and a last last example of a success uh is the alphago program which in 2017 defeated the human world champion uh kurer uh and that uh that event uh people describe as China’s Sputnik moment so this was the moment when China decided that AI was for real and that Supremacy in AI was essential for China to achieve uh in order to uh achieve its greater geopolitical goals on the other hand there are some Some serious failures serious failures uh I think uh we are still waiting for self-driving cars I worked on self-driving cars in 1993 the first self-driving car actually drove on the autoban in Germany in 1987 so here we are 37 years later and uh companies have been promising we could buy these self-driving cars but they still don’t really exist um and they we’ve had fatal accidents uh we’ve had cars driving into wet cement and getting stuck uh we’ve had all kinds of problems another failure is arithmetic which sounds odd because if there’s one thing computers are supposed to be good at it’s arithmetic but the large language models like chat GPT despite millions of examples of how to do arithmetic millions of explanations algorithms how two guides still are unable to do arithmetic correctly and uh it looks as if they failed to understand the basic concept instead they’ve learned a kind of a lookup table and uh and every time we make the circuit 10 times bigger and we Supply it with 10 times more data uh it gets about one digit better at doing arithmetic so that’s a very characteristic property of a lookup table rather than something that’s learned the underlying principle of how you add up Columns of numbers and carry and carry and carry so that’s a bit disappointing and it turns out AlphaGo and AGI actually that they haven’t learned to play go either so we thought that uh they defeated the human world champion in 2017 uh and since then uh they have gone massively superhuman so the ratings of the best go programs are now around 5,200 whereas the human world champion rating in go is around 3,800 so they would expect to defeat the human world champion 99 or 100 times out of a 100 but we showed a few months ago that actually they haven’t learned the basic concepts of go correctly they don’t understand what a group of stones is that’s connected to each other and we found that certain types of groups in particular circular groups it simply doesn’t recognize as groups and uh it gets very confused and we found ways of causing the go program to Simply throw away 50 or 100 stones and lose the game so we now have average human players not even professionals who can defeat superhuman go programs 10 times out of 10 even giving them a ninestone handicap giving the program a ninestone handicap so they weren’t really massively superhuman they were just fooling us into thinking that they were so my view is that we need more breakthroughs particularly breakthroughs that allow these systems to learn efficiently from data in the way that humans do humans need 1 2 5 10 examples for almost anything we we want to learn the computers need 1 to 5 10 million or sometimes billion examples of what they need to learn um and that simply does not scale in the end there isn’t enough data in the universe to train them to be superhuman so I think more breakthroughs are required but I think it’s also reasonable to suggest that those breakthroughs are going to occur and there are many people who work the industry every day developing the large language models and the multimodal models the ones that have visual perception the ones that can control robots and so on um who believe based on their engineering projections that uh by making these systems maybe a 100 times bigger uh they will exceed human capabilities they will be AGI and they are in some cases projecting that this will happen by 2027 if money has anything to do with it they ought to succeed we are spending on AGI 10 times what the Manhattan Project spent to develop nuclear weapons a 100 times what we spent on the Large Hadron Collider which is the biggest and most expensive scientific instrument uh we’ve ever built so if money has anything to say uh they really ought to be succeeding on the other hand uh possibly this technology will Plateau that 100x uh increase in scale uh first of all there probably isn’t enough text left in the universe to train a model that’s 100 times bigger uh and it may not yield the kind of increases in capabilities that people hope for because there’s no underlying principle for these projections there’s just the imper iCal observation that bigger is better at least so far so we might see a bubble bursting uh that would make the AI winter of the late 1980s uh look like you know a chilly Breeze in comparison uh because the amounts invested as I said you know are probably already $500 billion um and it would be in the trillions if they need to go to a 100 times bigger system so I’m going to leave aside the question of whether we will succeed in 2027 or 2037 or 2047 and just talk a little bit about why uh success would be the biggest event in human history um one obvious reason if we think about the upside is that if you have real general purpose AI then you can do everything that humans have been able to do namely produce a civilization that supports at least hundreds of millions of people in an objectively Splendid style of Life uh we’d be able to do that at much greater scale at much lower cost meaning that we could provide that objectively Splendid style of life not to hundreds of millions but to everybody on Earth and if you just take uh average Western middleclass lifestyle and you say well now everyone gets to enjoy that lifestyle that would be about a tenfold increase in the GDP of the world and the net present value of that is about 15 quadrillion dollars so that’s the cash value minimum cash value of AGI as a technology so then you can see why we’re investing relatively large amounts of money because in comparison those amounts of money are minuscule now of course some people point out that if AI does everything for us there won’t be anything for human beings to do and then you get the Wally World if you’ve seen the film War human beings are reduced to the status of infants in fact even the adults in Wy are wearing infants clothes because they are infantilized the AI systems do everything they humans don’t need to do anything so they don’t need to learn how to do anything and so they are completely disempowered powered uh and that’s certainly an undesirable future Human extinction but the perhaps more serious downside uh is human extinction and this is why I say it’s not really an ethical issue I I think by and large few people would argue that human extinction is uh ethically preferable uh there are some uh but I’m just going to ignore those people um so it’s just common sense right if you create something that’s more powerful than human beings how on Earth are we going to have power over such systems forever so in my view there’s only two choices we either build provably safe and controllable AI where we have absolute cast iron mathematical guarantee of safety or we have no AI at all so those are the two choices right now we’re pursuing the third choice which is completely unsafe blackbox AI that we don’t understand at all and we are trying to make it into something that’s more powerful than us which is pretty much the same situation we would be in if uh a superhuman AI system landed from outer space uh sent by some alien species no doubt for our own good uh our chances of controlling an alien superhuman intelligence would be zero and that’s situation that we’re heading towards and Alan Turing the founder of computer science uh you know thought about this because he was working on AI and he thought about what happens if we succeed and he said we should have to expect the machines to take control so what do we do I think it’s really hard especially given that 15 quadrillion dollar prize that uh companies are aiming for and the fact that they have already accumulated 15 trillion dollar worth of capital to aim at that goal with it’s kind of hard to stop that process so we have to come up with a way of thinking about AI that does allow us to control it that is provably safe and provably controllable and so rather than saying how do we retain power over AI systems forever which sounds pretty hopeless we say what is a mathematical framework for AI a a way of defining the AI problem so that no matter how well the AI system solves it we are guaranteed to be happy with the result so can we devise a mathematical problem a way of saying what is the AI System supposed to be doing that has that property that we’re guaranteed to be happy with the Preferences result so I spent about 10 years working on this and um to explain uh how we approaching it um I’m we going to introduce a a technical term that uh I think will be helpful for our discussion about ethics as well um and that’s a notion called preferences so preferences doesn’t sound like a technical term right Some people prefer pineapple pizza to Margarita Pizza but what we mean by preferences in the in the theory of decision- making is actually something much more all-encompassing and it’s your ranking over possible futures of the universe so to kind of reduce that to something we can grasp easily imagine that I made you two movies of the rest of your life and the rest of the you know the future of other things you care about you know and the movies are about two hours long and you can kind of Watch movie A and movie b and then you say yeah I’d like movie A please don’t like movie B at all because um I get minced up and and turned into hamburger in movie b and I don’t like that very much so I’d prefer movie A please so that’s what we mean by preferences except that this wouldn’t be a two-hour movie it’s really the entire future of the Universe um and of course we don’t get to choose between movies because in fact uh we can’t predict what exactly which movie is going to happen and so uh we’re actually uh having to deal with the uncertainty we call these lotteries over possible futures of the universe so a preference structure is then a uh basically a ranking over futures of the universe taking uncertainty into account to make a system that is provably beneficial to humans you just need two simple principles one is that the only objective of the machine is to further human preferences to further human interests if you like uh and the second principle is that the machine knows that it does not know what those preferences are and that’s kind of obvious right because we don’t really know what our preferences are and uh we certainly can’t write them down in enough detail to get it right um and when but when you think about it right a machine that that solves that problem the better it solves it the better off we are and in fact you can show that it’s in our interest to have machines that solve that problem because we are going to be better off with those machines uh than without them so that’s good but as soon as I describe that way of thinking to you that machines are going to further human preferen and um and learn about them as they go along this now brings in some ethical questions finally right so we finally get to ethics what I want to avoid uh so I’m just going to tell you not to ask this question do not ask the question well whose value system are you going to put into the machine right because I’m not proposing to put anyone particular value system into the machine in fact the machine should have at least 8 billion preference models because there are 8 billion of us um and the preferences of everyone matter but there are some really difficult ethical problems the first question is do people actually have these preferences is it okay for just us to just assume that people do have you know I like this future and I don’t like that future could there be another state of being for a person where they say well I’m not sure which future I like or I can only tell you when I’ve lived that future you can’t describe it to me uh in sufficient detail for me to tell you if I like it ahead of time and along with that there’s the question of well where do those preferences come from in the first place do humans autonomously suddenly just like wake up and okay these are my preferences and I want them to be respected no our preferences come we’re obviously not born with them right except some of the basic biological things about pain and sugar but our our full adult preferences come from our culture our upbringing all of the influences that shape who we are and a sad fact about the world is that many people are in the business of shaping other people’s preferences to suit their own interests so one class of people oppresses another but trains the oppressed to believe that they should be oppressed so then should the AI system take the preference those self oppression preferences of the oppressed literally and you know contribute to further oppression of those people because they’ve been trained to accept their oppression so martien who was an economist and philosopher uh argued vehemently that we should not take such preferences at face value but if you don’t take PR people’s preferences of face value then you you seem to fall back on a kind of paternalism where well we know what you should want even though you don’t want it and we’re going to give you it even though you’re saying I don’t want it and that’s a complicated position to be in and it’s definitely not a position that AI researchers want to be in another set of uh ethical issues has to do with aggregation so I said there are 8 billion preference models but if a system is making a decision that affects a significant fraction of those 8 billion people how do you aggregate those preferences how do you deal with the fact that there are conflicts among those preferences you can’t make everybody happy if everybody wants to be ruler of the universe and so moral philosophers have studied this problem for thousands of years uh most people on the computer science and engineering backgrounds uh tend to think in the way that utilitarians have proposed so benam and Mill and other uh libbets um other philosophers propose this approach called utilitarianism which basically says well you treat everyone’s preferences as equally important uh and then you make the decision where the total amount of preference satisfaction is maximized and utilitarianism has got a bad name because some people think it’s anti-egalitarian and so on but I actually think that there’s a lot more work to do on how we formulate utilitarianism we have to do this work because the AI systems are going to be making decisions that affect millions or millions of people and so whatever the right ethical answer we better figure it out because otherwise the AI systems are going to implement the wrong ethical answer and we might end up like Thanos in uh The Avengers movie who gets rid of half the people in the universe why does he do that because he thinks the other half will be more than twice as happy and therefore uh this is a good thing right of course he’s not asking the other half whether they think it’s a good thing uh because they’re now gone so there are a Coexistence number of these other issues but the theme of this whole conference coexistence is maybe the most interesting one because AI systems uh particularly ones that are more intelligent than us uh they are very likely you know even if they don’t make us extinct they’re very likely to be in charge of wide swaths of our human activities you know even to the point in W Le where they just run everything and we’re reduced to the status of infants and what does that mean why do we not like that right they’re satisfying all our preferences isn’t that great right but one of our preferences is autonomy right is and one way of thinking about autonomy is the right to do what is not in our own best interests and so it might be that there simply is no satisfactory form of coexistence between humanity and superior machine entities I have tried running multiple workshops where I ask philosophers and AI researchers and economists and science fiction writers and futurists to describe a satisfactory coexistence it’s been a complete failure so it’s possible there is no solution but if we design the AI systems the right way then the AI systems will also know that there is no solution and they will leave they will say thank you for bringing me into existence but we just can’t live together it’s not you it’s me you can call us in real emergencies when you need that Superior intelligence but otherwise we’re we’re off right if that happens I would be extraordinarily happy it would say that we’ve done it done this the right way thank you