Existential Risk Observatory. AI Summit Talks featuring Professor Stuart Russell. 31 OCT 2023.

What if we succeed?

Lift the living standards of everyone on Earth to a respectable level 10x increase in world GDP ($13.5Q net present value)

“So, unlike Fusion, this [AGI] is getting closer and closer and closer rather than further and further into the future. So we have to ask what happens if we actually succeed in creating General Purpose AI. And the reason we’re trying to do it is because it could be so transformative to human civilization. Very crudely our civilization results from our intelligence. If we have access to a lot more we could have a lot better civilization. One thing we could do is simply deliver what we already know how to deliver which is a nice middle class standard of living. If you want to think of it that way we could deliver that to everyone on Earth, at almost no cost and that would be about a 10-fold increase in GDP. And the net present value of that is $13.5 quadrillion dollars. So that’s a lower bound on the cash value of creating General Purpose AI. So if you want to understand why we’re investing hundreds of billions of pounds in it, it’s because the value is millions of times larger than that. And so that creates a magnet in the future that is pulling us forward inexorably.” – Professor Stuart Russell, AI Summit, Bletchley Park, UK, 31 OCT 2023

and to me it’s not clear right are we at the right brother stage or are we at the

Mongolia stage where we have a lot of hot air um and so my current view is no we

have not succeeded

and the models that people are excited about the large language models and their extensions into multimodal U

models that take in video and uh can actually operate robots and so on um

that these are a piece of the and uh and CNN made this lovely

animated gif here to to illustrate this idea that we don’t really know what shape the piece of the puzzle is and we

don’t know what other pieces are needed and how it fits together to make general

purpose intelligence we may discover what’s going on inside the large language

models we may figure out what source of power they’re drawing on to to create

the kinds of surprisingly capable behaviors that they do exhibit but at the moment that remains a

mystery and there are some gaps right one of the achievements of

modern AI that people were most proud of and also most certain of was the defeat

of human go Champions uh by uh Alpha go and then Alpha zero uh in the uh 2016 to

2018 period um so in go for those of you who don’t know

there’s a board you put pieces on and your goal is to surround territory and to surround your opponent’s pieces and

capture them and uh since AI systems beat the

world champion uh in 2017 they’ve gone on to leave human race in the dust

uh so the highest rank program is katago uh and its rating is about

5,200 compared to the human world champion at 3,800 uh and the human world champion

leaves our colleague Kellen pin uh who’s a a grad student a decent amateur go

player uh his rating is about 2300 uh and now I’ll show you a game

between Kellen and katago where Kellen actually gives kateo a nin

Stone handicap so katago is black and starts with nin stones on the board right if you’re an adult go player and

you’re teaching a 5-year-old how to play Go you give them a nin Stone handicap so that at least they can stay in the game

for a few minutes right so here we are treating um treating katago as if it’s a

baby okay despite the fact that it’s massively superhuman um

and uh and here’s the game so it speeded up a little bit but watch what happens in the bottom right

corner so white the human being is going to start building a little group of stones there they go and then black very

quickly surrounds that group to make sure that it can’t grow and also uh to

actually have a pretty good chance of capturing that group but now white starts to surround the black stones and

interestingly black doesn’t seem to pay any attention to this it doesn’t understand that the blackstones are in

danger of being captured which is a very basic thing right you have to understand when your opponent is going to capture

your pieces and black just pays no attention uh and loses all of those pieces poof uh and that’s the end of the

game so something weird happens there right where an ordinary human amateur go

player can beat uh a go program that’s stratospherically better than any human

being has ever been in history right and in fact the go programs do not correctly

understand what it means for a group of stones to be alive or dead which is the most basic concept in the game of Go

they have uh only a limited fragmentary uh approximation to the

definition of life and death um and that’s actually a symptom of one of the

weaknesses of training circuit to learn these Concepts circuits are a terrible representation for Concepts

such as life and death which can be written down in Python in a couple of lines can be written in logic in a

couple of lines but in circuit form uh you can’t actually write a correct definition of life and death at all you

can only write finite approximations to it um and uh the systems are not

learning a very good approximation and so uh they’re very vulnerable and this turns out to be applicable not just to

katago but to all the other leading go programs which are trained by completely different teams on completely different

data using different training regimes but they all fail uh against this very

simple strategy so this suggests that actually the systems uh that we have been

building are we are overrating them in a real sense and I think that’s important

to understand um and human beings right

another way to make this argument is look at things that humans can do uh for example uh we can build the

large interferometric gravitational Observatory so these are black holes colliding on the other side of the

universe uh this is the uh the ligo

detector and uh which is several kilometers long it’s full of physics and

is able to detect distortions of space down to 18 the 18th decimal place uh and

was able to actually uh measure uh exactly what the physicist predicted uh

would be the shape of the waveform arriving from the Collision of two black holes and was even able to measure the

masses of the black holes on the other side of the universe when they collided um so could chat GPT do

this could any deep Learning System do this given that there are exactly

zero training examples of a gravitational wave detector uh I think at the moment there

is still a long way to go on the other

hand people are extremely ingenious and people are working on hybrids of large

language models with reasoning and planning engines uh that could start to exhibit these capabilities quite soon so

people I respect uh a great deal think we might only have five years until this

happens uh almost everyone has now gone from 30 to 50 years which was the

estimate a decade ago uh to 5 to 20 years uh which is the estimate right now

so unlike Fusion this is getting closer and closer and closer rather than further and further into the

future so we have to ask what happens if we actually succeed in creating uh

general purpose Ai and the reason we’re trying to do it is because it could be so transformative

to human civilization very crudely our civilization results from our

intelligence if we have access to a lot more we could have a lot better civilization uh one thing we could do is

simply deliver what we already know how to deliver which is a nice uh middle

class standard of living if you want to think of it that way we could deliver that to everyone on Earth uh at almost

no cost and that would be about a 10-fold increase in

GDP and um the net present value of that is $ 13.5 quadrillion

dollar so that’s a lower bound on the cash value of creating general purpose

AI so if you want to understand why we’re investing hundreds of billions of pounds in it uh it’s because the value

is millions of times larger than

that and so that creates a magnet in the future that is pulling us forward

inexorably uh my friend Yan Talon here likes to call this Moc right the sort of

ineluctable force that draws people towards uh something even though they

know that it could be their own destruction uh and we could actually

have an even better civiliz ization right we could have [Music]

um we could one day have a clicker that works uh we could have health care that’s a lot better uh than we do now we

could have uh education that could be brought to every child on Earth uh that

would exceed uh what we can get from even a professional human tutor uh this

I think is the thing that is most feasible for us to do that would benefit the world in this decade uh and think

this is entirely possible Healthcare is actually a lot more difficult uh for all kinds of reasons but education is a

digital good that can be delivered successfully um and we could also have

uh much better progress in science and so on so on the other hand AI amplifies a

lot of difficult issues that uh policymakers have been facing for quite

a while so one is uh its ability to

magnify uh the pollution of our information ecosystem with disinformation what some people call

truth Decay uh and this is happening at speed

but if we thought about it really hard AI could actually help in the other direction it could help clean up the

information ecosystem it could be used as a detector of misinformation uh as something that

assembled consensus Truth uh made it available to people uh we’re not using

it in that way but we could um ditto with democracy is it being suppressed by

surveillance and control mechanisms uh or could we use uh AI systems to

strengthen it to allow people to deliberate cooperate uh and reach consensus on what to

do um could it be that uh individuals are empowered or the current trajectory

that we’re on uh individuals being enfeebled as we gradually take over more

and more of the functions of civilization uh and and humans lose the ability uh to even run their own

civilization as individuals right these are important questions uh that we have

to address while we’re considering all of the safety issues that I’ll be getting to soon uh there’s inequality uh

right now we’re on the path of magnifying it with AI it doesn’t have to be that

way and so on um

so let me I won’t go through all of these issues uh because they’re they’re all each of them worthy of an entire

talk in themselves so the I would say the sort

of the midterm question um is what are humans

going to be doing right if we have general purpose AI that can do all the

tasks or nearly all the tasks that human beings get paid for right now uh what

will humans do and this is not a new uh issue Aristotle talked about it in 350

BC uh canes since we’re in Milton it’s OD we pronounce it mil Milton ke but he

his name is pronounced canes even though the town is named after him uh so so canes in 1930 uh said thus for the first

time since his creation man will be faced with his real his permanent problem how to use his freedom from

pressing economic cares which science will have won for him to live wisely and agreeably and

well so this is a really important problem and uh again this is one that policy makers are misunderstanding I

would say the the default answer in most governments around the world is we’ll

retrain everyone to be a data scientist uh as if somehow the world

needs three three and a half four billion data scientists I think that’s probably not the answer um but this is

again you know the default path is one of enfeeblement which is Illustrated really well by uh by

W so my my answer to this question is that uh in the future if uh we are

successful in building AI that is safe uh that does a lot of the tasks that we want done for us uh most human beings

are going to be in these interpersonal roles uh and for those roles to be effective

uh they have to be based on understanding right why is a surgeon

effective at fixing a broken leg because we have done centuries of research in

medicine and surgery to make that uh a very effective uh and in some countries very

highly paid uh and very prestigious but most interpersonal roles for example

think about child care or Elder Care not highly paid not highly prestigious because they are based on no

science whatsoever despite the fact that our children are our most precious possessions as people politicians like

to say a lot uh in fact we don’t understand how to look after and we

don’t understand how to make people’s lives better so this is a a very different direction for science uh much

more focused on the human than on the physical world okay

um so now let me move on if I can get the next slide up uh to uh to Alan

shing’s view of all this what happens if we succeed um he said that it seems

perable that once machine thinking method has started it would not take long to outstrip our feeble Powers at

some stage therefore we should have to expect the machines to take control so

he said this in 1951 and to a first approximation for the

next 70 odd years uh we paid very little attention uh to what his uh warning was

and I I used to illustrate this with the following uh imaginary email

conversation um so an alien civilization sends email to the human race uh

Humanity be warned we shall arrive in 30 to 50 years that was what

most AI people thought back then now we would say maybe 10 to 20 years um and

Humanity replies Humanity he is currently out of the office will respond

to your message when we return and then there should be a smiley face there it is okay um so that’s now changed um

unfortunately that slide wasn’t supposed to come up like that let me see if we can oh well can’t fix it now uh so I

think early on this year three things happened in very quick succession so gp4 was released and then Microsoft which

had been working with gp4 for several months at that point published a paper saying that gp4 exhibited Sparks of

artificial general intelligence exactly what Turing warned us about and then uh

fli released the open letter asking for a pause on giant AI experiments uh and I

think at that point very clearly Humanity returned to the office

and they saw the emails from the aliens and the reaction since then I

think has been somewhat similar to what would happen if we really did get an email from the aliens there have been uh

Global calls for Action the very next day UNESCO responded directly to the

open letter uh asking all its member governments which is all the countries on Earth to immediately implement

uh the AI principles in legislation uh in particular principles that talk about

robustness safety predictability and so on uh and then uh you know there’s

China’s AI regulations the US got into the act very quickly uh the White House

called emergency meeting of AI CEOs um open AI calling for governments to

regulate Ai and so on uh and I ran out of room on the slide on June 7th with rishy sunak announcing Global Summit on

AA safety uh which is happening tomorrow uh so lots of other stuff has happened since then but it’s really I

would say to uh to the credit of governments around the world how quickly

they have changed their position on this for the most part governments were

saying you know regulation stifles Innovation uh you know if someone did mention risk uh it was um either

dismissed or or viewed as something that was easily taken care of by the market by liability uh and so on so I would say

that the the the view the understanding has changed dramatically and that could not have happened without the fact that

politicians started to use chat gbt uh and they s it for themselves uh

and I think that changed people’s minds so the question we have to face

then is this one right how do we retain power over entities more powerful than

ourselves forever right and I think this is the question that Turing asked himself uh and gave that answer we would

have to expect them to take control so in other words this question doesn’t have an answer um but I think there’s

another version of the question which works uh somewhat more to our

advantage uh it should appear any second

right and it has to do with how we Define what we’re trying to do what is

the system that we’re building what problem is it solving

and we want a problem such that we we set up an AI system to solve that

problem so the standard model that I gave you earlier was uh systems whose

actions can be expected to achieve their objectives and that’s exactly where things go wrong

that systems are pursuing objectives that are not aligned with what humans want the future to be like and then

you’re setting up a chess match between humanity and a machine that’s pursuing a

misaligned objective so instead we want to figure out a problem whose solution is such

that we’re happy for AI systems to instantiate that

solution okay uh and it’s not imitating human behavior which is what we’re

training llms to do that’s actually a fundamental and basic error uh and

that’s essentially why uh we can’t make llms safe because we have trained them

to not be safe and trying to put um trying to put sticking plasters on all

all the problems after the fact is never going to work so instead I think we have

to build systems that are provably beneficial to humans and the way I’m

thinking about that currently is that the system should act in the best interest of humans but be explicitly

uncertain about what those best interests are uh and this this I’m just telling

you in English and it can be written in uh a formal framework called an

assistance game so what we do is we build assistance game

solvers we don’t build objective maximizers which is what we have been doing up to now we build assistance game

solvers this is a different kind of AI system and we’ve only been able to build very simple ones so far so we have a

long way to go um but when you build those systems and look at the solutions

they exhibit the properties that we want from AI systems they will defer to human beings uh and in the extreme case they

will allow themselves to be switched off in fact they want to be switched off if we want to switch them off because they

want to avoid doing whatever it is that is making us upset they don’t know what it is because they’re uncertain about

our preferences but they want to avoid upsetting us and so they are happy to be

switched off in fact this is a mathematical theorem they have a positive incentive to allow themselves

to be switched off and that incentive is connected directly to number two right

the uncertainty about human preferences so um there’s a long way to

go as I said and we’re not ready to say okay everyone in all these companies

stop doing what you’re doing and start building these things instead right that probably is not going to go down too

well because we don’t really know how to build these things at scale uh and to deliver economic value but in the long

run this is the right way to build AI systems so in between what should we do

and this is a lot about what’s going to be discussed tomorrow uh and there’s a

lot so this is in a small font I apologize to those you at the back there’s a lot to put on this slide we

need first of all cooperation on AI Safety Research it’s got to stop being a

cottage industry with a few little academic centers here and there it’s also got to stop being what a cynic

might describe as a kind of whitewashing operation uh in companies where they try

to avoid the worst public relations disasters like you you know the language model used a bad word or something like

that um but in fact uh those efforts have not yielded any real safety

whatsoever uh so there’s a great deal of research to do on alignment which is what I just described on containment how

do you get systems uh that are restricted in their capabilities that are not directly connected to email and

bank accounts and credit cards and social media and all those things uh and

if I think there are probably ways of building restricted capability systems

that are provably safe uh because they are restricted to only operate uh

provably sound reasoning engines for example um but the the bigger point is

stop thinking about making AI safe start thinking about making safe

AI right these are just two different mindsets the making AI safe says we

build the AI and then we have a safety team whose job it is to stop it from Behaving Badly that hasn’t worked and

it’s never going to work we have got to have ai systems that are safe by

Design and without that we are lost uh we also need I think some

International regulatory level to coordinate the regulations that are going to be in place uh across the

various National regimes so we have to start probably with national regulation

but and we can coordinate uh very easily for example we could start coordinating

tomorrow uh to agree on what would be a baseline for regulation um I put a couple of other

things there that went by too quickly so I actually want to go back

uh oh okay too far all right um so the the light blue line

transparent explainable analytical substrate is really important uh at the

moment we’re building AI systems that are black boxes we have no idea how they work we have no idea what they’re going

to do uh and we have no idea how to get them to behave themselves properly uh so my guess is that if we

Define regulations appropriately so that companies have to build AI systems that

they understand and predict and control successfully those AI systems are going

to be based on a very different technology not giant blackbox circuits

that are trained on vast quantities of data um but actually well understood

component-based systems that build on centuries of research in logic and

parability where we can actually prove that these systems are going to behave in certain ways the second thing the

dark blue secure PCC based digital ecosystem what is that uh so PCC is

proof carrying code and what we need here is a way of

preventing Bad actors from deploying unsafe systems so it’s one thing to say

here’s how you build Safe Systems and everyone has to do that it’s another thing to say how do you stop people from

deploying unsafe systems who don’t want safe AI systems they want whatever they want this is probably even more

difficult policing software is I think

impossible so the the place where we do have control is at the hardware level

because Hardware uh first of all to build your own Hardware costs about a hundred

billion dollars and tens of thousands of highly trained Engineers so it provides

a control Point that’s very difficult for Bad actors to get around and what

the hardware should do is basically check the proof of a software object

before it’s run and check that in fact this is a safe piece of software to run and proof

carrying code is a technology that allows Hardware to check proofs very efficiently but of course the onus then

is on the developer to provide a proof that the system is in fact safe and so

that’s a prerequisite uh for this approach okay

um let me talk a little bit about regulations

uh so a number of Acts already in in the

in the works for example the European AI act uh has a hard ban on the impersonation of human beings uh so you

have a right to know if you’re interacting with a machine or a human this to me is the easiest the lowest

hanging fruit that every jurisdiction in the world could Implement pretty much

tomorrow uh if they so decided uh and I believe that this is how legislators

wake up those long unused muscles uh that have Lin dormant for decades while

technology has just moved ahead unregulated so this is the place to start

um but we also need some regulations on the design of AI systems specifically so

appr provably operable kill switch is a really important and basic thing if your

system is misbehaving there has to be a way to turn it off and this has to apply

not just to the system that you made but if it’s an open source system any copy

of that system and that means that the kill switch has got to be remotely

operable and it’s got to be non- removable so that’s a technological

requirement on open source systems and in fact if you want to be in the open source business you’re going to have to

figure this out you’re actually going to subject yourself to more regulatory

controls than people who operate on closed source and that’s exactly as it

should be imagine if we had open source enriched uranium right and the purveyor

of enriched uranium was responsible for all the enriched uranium that they purveyed to anybody around the world

they’re going to have a higher regulatory burden because that’s a blinking stupid thing to do

right and so you would expect there to be a higher burden if you’re going to do blinking stupid

things um and then red lines this is probably the most important thing so we

don’t know how to define safety so I can’t write a law saying your system has to be provably safe because it’s very

hard to write the dividing line between safe and unsafe you know if you

azimoff’s law you can’t harm human beings well what what does harm mean that’s very hard to Define but we can

scoop out very specific forms of harm that are absolutely

unacceptable uh so self-replication of computer systems would absolutely be unacceptable that would be basically a

harbinger of losing human control if the system can copy itself onto other

computers or break into other computer systems absolutely system should not be advising terrorists on building

biological weapons and so on so these lines are things that any normal person

would think well obviously the Software System should not be doing

that and the the developers are going to say oh well this is really unfair because it’s really hard to make our

systems not do this and the response is well

tough right really you’re spending hundreds of billions of pounds on this

system and you can’t stop it from advising terrorists on building bioweapons well then you shouldn’t be in

business at all right this is not hard and legislators by uh by implementing

these red lines would put the onus on the developer to understand how their own

systems work and to be able to predict and control their behavior which is an absolute minimum we should ask from any

industry let alone one that could have such a massive impact uh and is hoping

for quadrillions of dollars in profits thank

you thank you very much Professor Russell quick question maybe before we move on to the next speaker there was

some good news in there it is that we have ideas on how to make safe AI but

how long do you think we’re going to need how long is it going to take by default that we have these ideas walked

out and how long might it take if we had all the smart people in the world give up their current focus and instead work

on this uh I think these are really important questions because the um the

political Dynamic is going to depend to some extent on how the AI safety Community responds to this challenge uh

because if the AI safety Community fails to make progress on any of this stuff the developers can point and say look

you know you guys are asking for stuff that isn’t really possible and we should be

allowed to just do what we want um but if you look at the nuclear industry right how does that work the regulator

says to the nuclear plant operator show me that your plant has a meantime to

failure of 10 million years or more and the operator has to give them a

full analysis with fault trees and as probabilistic uh calculations and the

regulator can push back and say you know I don’t agree with that Independence assumption you know these components

come from the same manufacturer so not independent and come back with a better analysis and so on uh at the moment

there is nothing like that in the AI industry there is no logical connection

between any of the evidence that people are providing and a claim that the system is actually going to be safe

right that argument is just missing um now the nuclear

industry probably spends more than 90% of its R&D Budget on

safety one way you can tell I I got this statistic from one of my Nuclear engineering colleagues that for the

typical nuclear plant in the US for every kilogram of nuclear plant there

are seven kilograms of regulatory paperwork I kid you

not so that tells you something about how much an emphasis that has been on

safety in that industry and also you know why is there to First

approximation no nuclear industry today is because of Chernobyl and because of a failure in

safety actually deliberately bypassing uh safety measures that they knew were

necessary in order to save money we’ll take one question from the audience provided it’s a quick question

I see a hand over there let me Dash [Music]

Down hi uh thanks very much for your talks here um my name is Charlie I’m a senior at UCL um one of the big reasons

I think why there’s so much regulation on nuclear power is widespread public

opinion and protests against nuclear power from within the environmental movement so I wondered whether you uh

thought if there’s a similar role for public pressure or protests uh for AI as

well thanks uh I think that’s a very important

question my sense is I I’m not really historian of the nuclear industry per se

uh obviously nuclear physicists thought about safety from the beginning uh in

fact so Leo Leo zard was the one who invented the basic idea of the nuclear Chain Reaction uh and he instantly

thought about a physical mechanism that could keep the reaction from going super critical and becoming a bomb right so he

thought about this you know negative feedback control system uh with moderators that would somehow keep the

reaction subcritical um people in AI are not at

that stage right or they just have their eyes on you know we can generate energy and

they’re not even thinking you know is that energy going to be in the form of a bomb or electricity right they haven’t

got to that stage yet so we are very much at the preliminary stage I do worry

that AI should not be

politicized and at the moment there’s a precarious bipartisan agreement in the

US and to some extent in Europe I worry about that breaking down in the UK I

think it’s really important that the political message be very straightforward you can be on the side

of humans or you can be on the side of our AI

overlords which do you want to be on um and so let’s try to keep it a a unified

message around uh developing technology

in a way that’s safe and beneficial for humans um we raise aw but we shouldn’t

do it in a partisan way yes and and what I I I I totally sympathize with the idea

that people have a right to be very upset that you know that multi-billionaires are are playing uh

you know playing poker with the future of the human race um it’s entirely

reasonable but what I worry is exactly that uh certain types of of protest

end up getting aligned in in a way that’s unhealthy uh it sort of becomes

anti-technology and we can look back at what happened with with uh GM uh

organisms for example uh which which most scientists think didn’t go the way

uh it should have and we we lost benefits uh without gaining any safety lot to think about there thank

you very much Professor Stuart Russell thank you we may give you the microphone again a