Existential Risk Observatory. AI Summit Talks featuring Professor Stuart Russell. 31 OCT 2023.
What if we succeed?
Lift the living standards of everyone on Earth to a respectable level 10x increase in world GDP ($13.5Q net present value)
“So, unlike Fusion, this [AGI] is getting closer and closer and closer rather than further and further into the future. So we have to ask what happens if we actually succeed in creating General Purpose AI. And the reason we’re trying to do it is because it could be so transformative to human civilization. Very crudely our civilization results from our intelligence. If we have access to a lot more we could have a lot better civilization. One thing we could do is simply deliver what we already know how to deliver which is a nice middle class standard of living. If you want to think of it that way we could deliver that to everyone on Earth, at almost no cost and that would be about a 10-fold increase in GDP. And the net present value of that is $13.5 quadrillion dollars. So that’s a lower bound on the cash value of creating General Purpose AI. So if you want to understand why we’re investing hundreds of billions of pounds in it, it’s because the value is millions of times larger than that. And so that creates a magnet in the future that is pulling us forward inexorably.” – Professor Stuart Russell, AI Summit, Bletchley Park, UK, 31 OCT 2023
and to me it’s not clear right are we at the right brother stage or are we at the
Mongolia stage where we have a lot of hot air um and so my current view is no we
have not succeeded
and the models that people are excited about the large language models and their extensions into multimodal U
models that take in video and uh can actually operate robots and so on um
that these are a piece of the and uh and CNN made this lovely
animated gif here to to illustrate this idea that we don’t really know what shape the piece of the puzzle is and we
don’t know what other pieces are needed and how it fits together to make general
purpose intelligence we may discover what’s going on inside the large language
models we may figure out what source of power they’re drawing on to to create
the kinds of surprisingly capable behaviors that they do exhibit but at the moment that remains a
mystery and there are some gaps right one of the achievements of
modern AI that people were most proud of and also most certain of was the defeat
of human go Champions uh by uh Alpha go and then Alpha zero uh in the uh 2016 to
2018 period um so in go for those of you who don’t know
there’s a board you put pieces on and your goal is to surround territory and to surround your opponent’s pieces and
capture them and uh since AI systems beat the
world champion uh in 2017 they’ve gone on to leave human race in the dust
uh so the highest rank program is katago uh and its rating is about
5,200 compared to the human world champion at 3,800 uh and the human world champion
leaves our colleague Kellen pin uh who’s a a grad student a decent amateur go
player uh his rating is about 2300 uh and now I’ll show you a game
between Kellen and katago where Kellen actually gives kateo a nin
Stone handicap so katago is black and starts with nin stones on the board right if you’re an adult go player and
you’re teaching a 5-year-old how to play Go you give them a nin Stone handicap so that at least they can stay in the game
for a few minutes right so here we are treating um treating katago as if it’s a
baby okay despite the fact that it’s massively superhuman um
and uh and here’s the game so it speeded up a little bit but watch what happens in the bottom right
corner so white the human being is going to start building a little group of stones there they go and then black very
quickly surrounds that group to make sure that it can’t grow and also uh to
actually have a pretty good chance of capturing that group but now white starts to surround the black stones and
interestingly black doesn’t seem to pay any attention to this it doesn’t understand that the blackstones are in
danger of being captured which is a very basic thing right you have to understand when your opponent is going to capture
your pieces and black just pays no attention uh and loses all of those pieces poof uh and that’s the end of the
game so something weird happens there right where an ordinary human amateur go
player can beat uh a go program that’s stratospherically better than any human
being has ever been in history right and in fact the go programs do not correctly
understand what it means for a group of stones to be alive or dead which is the most basic concept in the game of Go
they have uh only a limited fragmentary uh approximation to the
definition of life and death um and that’s actually a symptom of one of the
weaknesses of training circuit to learn these Concepts circuits are a terrible representation for Concepts
such as life and death which can be written down in Python in a couple of lines can be written in logic in a
couple of lines but in circuit form uh you can’t actually write a correct definition of life and death at all you
can only write finite approximations to it um and uh the systems are not
learning a very good approximation and so uh they’re very vulnerable and this turns out to be applicable not just to
katago but to all the other leading go programs which are trained by completely different teams on completely different
data using different training regimes but they all fail uh against this very
simple strategy so this suggests that actually the systems uh that we have been
building are we are overrating them in a real sense and I think that’s important
to understand um and human beings right
another way to make this argument is look at things that humans can do uh for example uh we can build the
large interferometric gravitational Observatory so these are black holes colliding on the other side of the
universe uh this is the uh the ligo
detector and uh which is several kilometers long it’s full of physics and
is able to detect distortions of space down to 18 the 18th decimal place uh and
was able to actually uh measure uh exactly what the physicist predicted uh
would be the shape of the waveform arriving from the Collision of two black holes and was even able to measure the
masses of the black holes on the other side of the universe when they collided um so could chat GPT do
this could any deep Learning System do this given that there are exactly
zero training examples of a gravitational wave detector uh I think at the moment there
is still a long way to go on the other
hand people are extremely ingenious and people are working on hybrids of large
language models with reasoning and planning engines uh that could start to exhibit these capabilities quite soon so
people I respect uh a great deal think we might only have five years until this
happens uh almost everyone has now gone from 30 to 50 years which was the
estimate a decade ago uh to 5 to 20 years uh which is the estimate right now
so unlike Fusion this is getting closer and closer and closer rather than further and further into the
future so we have to ask what happens if we actually succeed in creating uh
general purpose Ai and the reason we’re trying to do it is because it could be so transformative
to human civilization very crudely our civilization results from our
intelligence if we have access to a lot more we could have a lot better civilization uh one thing we could do is
simply deliver what we already know how to deliver which is a nice uh middle
class standard of living if you want to think of it that way we could deliver that to everyone on Earth uh at almost
no cost and that would be about a 10-fold increase in
GDP and um the net present value of that is $ 13.5 quadrillion
dollar so that’s a lower bound on the cash value of creating general purpose
AI so if you want to understand why we’re investing hundreds of billions of pounds in it uh it’s because the value
is millions of times larger than
that and so that creates a magnet in the future that is pulling us forward
inexorably uh my friend Yan Talon here likes to call this Moc right the sort of
ineluctable force that draws people towards uh something even though they
know that it could be their own destruction uh and we could actually
have an even better civiliz ization right we could have [Music]
um we could one day have a clicker that works uh we could have health care that’s a lot better uh than we do now we
could have uh education that could be brought to every child on Earth uh that
would exceed uh what we can get from even a professional human tutor uh this
I think is the thing that is most feasible for us to do that would benefit the world in this decade uh and think
this is entirely possible Healthcare is actually a lot more difficult uh for all kinds of reasons but education is a
digital good that can be delivered successfully um and we could also have
uh much better progress in science and so on so on the other hand AI amplifies a
lot of difficult issues that uh policymakers have been facing for quite
a while so one is uh its ability to
magnify uh the pollution of our information ecosystem with disinformation what some people call
truth Decay uh and this is happening at speed
but if we thought about it really hard AI could actually help in the other direction it could help clean up the
information ecosystem it could be used as a detector of misinformation uh as something that
assembled consensus Truth uh made it available to people uh we’re not using
it in that way but we could um ditto with democracy is it being suppressed by
surveillance and control mechanisms uh or could we use uh AI systems to
strengthen it to allow people to deliberate cooperate uh and reach consensus on what to
do um could it be that uh individuals are empowered or the current trajectory
that we’re on uh individuals being enfeebled as we gradually take over more
and more of the functions of civilization uh and and humans lose the ability uh to even run their own
civilization as individuals right these are important questions uh that we have
to address while we’re considering all of the safety issues that I’ll be getting to soon uh there’s inequality uh
right now we’re on the path of magnifying it with AI it doesn’t have to be that
way and so on um
so let me I won’t go through all of these issues uh because they’re they’re all each of them worthy of an entire
talk in themselves so the I would say the sort
of the midterm question um is what are humans
going to be doing right if we have general purpose AI that can do all the
tasks or nearly all the tasks that human beings get paid for right now uh what
will humans do and this is not a new uh issue Aristotle talked about it in 350
BC uh canes since we’re in Milton it’s OD we pronounce it mil Milton ke but he
his name is pronounced canes even though the town is named after him uh so so canes in 1930 uh said thus for the first
time since his creation man will be faced with his real his permanent problem how to use his freedom from
pressing economic cares which science will have won for him to live wisely and agreeably and
well so this is a really important problem and uh again this is one that policy makers are misunderstanding I
would say the the default answer in most governments around the world is we’ll
retrain everyone to be a data scientist uh as if somehow the world
needs three three and a half four billion data scientists I think that’s probably not the answer um but this is
again you know the default path is one of enfeeblement which is Illustrated really well by uh by
W so my my answer to this question is that uh in the future if uh we are
successful in building AI that is safe uh that does a lot of the tasks that we want done for us uh most human beings
are going to be in these interpersonal roles uh and for those roles to be effective
uh they have to be based on understanding right why is a surgeon
effective at fixing a broken leg because we have done centuries of research in
medicine and surgery to make that uh a very effective uh and in some countries very
highly paid uh and very prestigious but most interpersonal roles for example
think about child care or Elder Care not highly paid not highly prestigious because they are based on no
science whatsoever despite the fact that our children are our most precious possessions as people politicians like
to say a lot uh in fact we don’t understand how to look after and we
don’t understand how to make people’s lives better so this is a a very different direction for science uh much
more focused on the human than on the physical world okay
um so now let me move on if I can get the next slide up uh to uh to Alan
shing’s view of all this what happens if we succeed um he said that it seems
perable that once machine thinking method has started it would not take long to outstrip our feeble Powers at
some stage therefore we should have to expect the machines to take control so
he said this in 1951 and to a first approximation for the
next 70 odd years uh we paid very little attention uh to what his uh warning was
and I I used to illustrate this with the following uh imaginary email
conversation um so an alien civilization sends email to the human race uh
Humanity un.org be warned we shall arrive in 30 to 50 years that was what
most AI people thought back then now we would say maybe 10 to 20 years um and
Humanity replies Humanity he is currently out of the office will respond
to your message when we return and then there should be a smiley face there it is okay um so that’s now changed um
unfortunately that slide wasn’t supposed to come up like that let me see if we can oh well can’t fix it now uh so I
think early on this year three things happened in very quick succession so gp4 was released and then Microsoft which
had been working with gp4 for several months at that point published a paper saying that gp4 exhibited Sparks of
artificial general intelligence exactly what Turing warned us about and then uh
fli released the open letter asking for a pause on giant AI experiments uh and I
think at that point very clearly Humanity returned to the office
and they saw the emails from the aliens and the reaction since then I
think has been somewhat similar to what would happen if we really did get an email from the aliens there have been uh
Global calls for Action the very next day UNESCO responded directly to the
open letter uh asking all its member governments which is all the countries on Earth to immediately implement
uh the AI principles in legislation uh in particular principles that talk about
robustness safety predictability and so on uh and then uh you know there’s
China’s AI regulations the US got into the act very quickly uh the White House
called emergency meeting of AI CEOs um open AI calling for governments to
regulate Ai and so on uh and I ran out of room on the slide on June 7th with rishy sunak announcing Global Summit on
AA safety uh which is happening tomorrow uh so lots of other stuff has happened since then but it’s really I
would say to uh to the credit of governments around the world how quickly
they have changed their position on this for the most part governments were
saying you know regulation stifles Innovation uh you know if someone did mention risk uh it was um either
dismissed or or viewed as something that was easily taken care of by the market by liability uh and so on so I would say
that the the the view the understanding has changed dramatically and that could not have happened without the fact that
politicians started to use chat gbt uh and they s it for themselves uh
and I think that changed people’s minds so the question we have to face
then is this one right how do we retain power over entities more powerful than
ourselves forever right and I think this is the question that Turing asked himself uh and gave that answer we would
have to expect them to take control so in other words this question doesn’t have an answer um but I think there’s
another version of the question which works uh somewhat more to our
advantage uh it should appear any second
right and it has to do with how we Define what we’re trying to do what is
the system that we’re building what problem is it solving
and we want a problem such that we we set up an AI system to solve that
problem so the standard model that I gave you earlier was uh systems whose
actions can be expected to achieve their objectives and that’s exactly where things go wrong
that systems are pursuing objectives that are not aligned with what humans want the future to be like and then
you’re setting up a chess match between humanity and a machine that’s pursuing a
misaligned objective so instead we want to figure out a problem whose solution is such
that we’re happy for AI systems to instantiate that
solution okay uh and it’s not imitating human behavior which is what we’re
training llms to do that’s actually a fundamental and basic error uh and
that’s essentially why uh we can’t make llms safe because we have trained them
to not be safe and trying to put um trying to put sticking plasters on all
all the problems after the fact is never going to work so instead I think we have
to build systems that are provably beneficial to humans and the way I’m
thinking about that currently is that the system should act in the best interest of humans but be explicitly
uncertain about what those best interests are uh and this this I’m just telling
you in English and it can be written in uh a formal framework called an
assistance game so what we do is we build assistance game
solvers we don’t build objective maximizers which is what we have been doing up to now we build assistance game
solvers this is a different kind of AI system and we’ve only been able to build very simple ones so far so we have a
long way to go um but when you build those systems and look at the solutions
they exhibit the properties that we want from AI systems they will defer to human beings uh and in the extreme case they
will allow themselves to be switched off in fact they want to be switched off if we want to switch them off because they
want to avoid doing whatever it is that is making us upset they don’t know what it is because they’re uncertain about
our preferences but they want to avoid upsetting us and so they are happy to be
switched off in fact this is a mathematical theorem they have a positive incentive to allow themselves
to be switched off and that incentive is connected directly to number two right
the uncertainty about human preferences so um there’s a long way to
go as I said and we’re not ready to say okay everyone in all these companies
stop doing what you’re doing and start building these things instead right that probably is not going to go down too
well because we don’t really know how to build these things at scale uh and to deliver economic value but in the long
run this is the right way to build AI systems so in between what should we do
and this is a lot about what’s going to be discussed tomorrow uh and there’s a
lot so this is in a small font I apologize to those you at the back there’s a lot to put on this slide we
need first of all cooperation on AI Safety Research it’s got to stop being a
cottage industry with a few little academic centers here and there it’s also got to stop being what a cynic
might describe as a kind of whitewashing operation uh in companies where they try
to avoid the worst public relations disasters like you you know the language model used a bad word or something like
that um but in fact uh those efforts have not yielded any real safety
whatsoever uh so there’s a great deal of research to do on alignment which is what I just described on containment how
do you get systems uh that are restricted in their capabilities that are not directly connected to email and
bank accounts and credit cards and social media and all those things uh and
if I think there are probably ways of building restricted capability systems
that are provably safe uh because they are restricted to only operate uh
provably sound reasoning engines for example um but the the bigger point is
stop thinking about making AI safe start thinking about making safe
AI right these are just two different mindsets the making AI safe says we
build the AI and then we have a safety team whose job it is to stop it from Behaving Badly that hasn’t worked and
it’s never going to work we have got to have ai systems that are safe by
Design and without that we are lost uh we also need I think some
International regulatory level to coordinate the regulations that are going to be in place uh across the
various National regimes so we have to start probably with national regulation
but and we can coordinate uh very easily for example we could start coordinating
tomorrow uh to agree on what would be a baseline for regulation um I put a couple of other
things there that went by too quickly so I actually want to go back
uh oh okay too far all right um so the the light blue line
transparent explainable analytical substrate is really important uh at the
moment we’re building AI systems that are black boxes we have no idea how they work we have no idea what they’re going
to do uh and we have no idea how to get them to behave themselves properly uh so my guess is that if we
Define regulations appropriately so that companies have to build AI systems that
they understand and predict and control successfully those AI systems are going
to be based on a very different technology not giant blackbox circuits
that are trained on vast quantities of data um but actually well understood
component-based systems that build on centuries of research in logic and
parability where we can actually prove that these systems are going to behave in certain ways the second thing the
dark blue secure PCC based digital ecosystem what is that uh so PCC is
proof carrying code and what we need here is a way of
preventing Bad actors from deploying unsafe systems so it’s one thing to say
here’s how you build Safe Systems and everyone has to do that it’s another thing to say how do you stop people from
deploying unsafe systems who don’t want safe AI systems they want whatever they want this is probably even more
difficult policing software is I think
impossible so the the place where we do have control is at the hardware level
because Hardware uh first of all to build your own Hardware costs about a hundred
billion dollars and tens of thousands of highly trained Engineers so it provides
a control Point that’s very difficult for Bad actors to get around and what
the hardware should do is basically check the proof of a software object
before it’s run and check that in fact this is a safe piece of software to run and proof
carrying code is a technology that allows Hardware to check proofs very efficiently but of course the onus then
is on the developer to provide a proof that the system is in fact safe and so
that’s a prerequisite uh for this approach okay
um let me talk a little bit about regulations
uh so a number of Acts already in in the
in the works for example the European AI act uh has a hard ban on the impersonation of human beings uh so you
have a right to know if you’re interacting with a machine or a human this to me is the easiest the lowest
hanging fruit that every jurisdiction in the world could Implement pretty much
tomorrow uh if they so decided uh and I believe that this is how legislators
wake up those long unused muscles uh that have Lin dormant for decades while
technology has just moved ahead unregulated so this is the place to start
um but we also need some regulations on the design of AI systems specifically so
appr provably operable kill switch is a really important and basic thing if your
system is misbehaving there has to be a way to turn it off and this has to apply
not just to the system that you made but if it’s an open source system any copy
of that system and that means that the kill switch has got to be remotely
operable and it’s got to be non- removable so that’s a technological
requirement on open source systems and in fact if you want to be in the open source business you’re going to have to
figure this out you’re actually going to subject yourself to more regulatory
controls than people who operate on closed source and that’s exactly as it
should be imagine if we had open source enriched uranium right and the purveyor
of enriched uranium was responsible for all the enriched uranium that they purveyed to anybody around the world
they’re going to have a higher regulatory burden because that’s a blinking stupid thing to do
right and so you would expect there to be a higher burden if you’re going to do blinking stupid
things um and then red lines this is probably the most important thing so we
don’t know how to define safety so I can’t write a law saying your system has to be provably safe because it’s very
hard to write the dividing line between safe and unsafe you know if you
azimoff’s law you can’t harm human beings well what what does harm mean that’s very hard to Define but we can
scoop out very specific forms of harm that are absolutely
unacceptable uh so self-replication of computer systems would absolutely be unacceptable that would be basically a
harbinger of losing human control if the system can copy itself onto other
computers or break into other computer systems absolutely system should not be advising terrorists on building
biological weapons and so on so these lines are things that any normal person
would think well obviously the Software System should not be doing
that and the the developers are going to say oh well this is really unfair because it’s really hard to make our
systems not do this and the response is well
tough right really you’re spending hundreds of billions of pounds on this
system and you can’t stop it from advising terrorists on building bioweapons well then you shouldn’t be in
business at all right this is not hard and legislators by uh by implementing
these red lines would put the onus on the developer to understand how their own
systems work and to be able to predict and control their behavior which is an absolute minimum we should ask from any
industry let alone one that could have such a massive impact uh and is hoping
for quadrillions of dollars in profits thank
you thank you very much Professor Russell quick question maybe before we move on to the next speaker there was
some good news in there it is that we have ideas on how to make safe AI but
how long do you think we’re going to need how long is it going to take by default that we have these ideas walked
out and how long might it take if we had all the smart people in the world give up their current focus and instead work
on this uh I think these are really important questions because the um the
political Dynamic is going to depend to some extent on how the AI safety Community responds to this challenge uh
because if the AI safety Community fails to make progress on any of this stuff the developers can point and say look
you know you guys are asking for stuff that isn’t really possible and we should be
allowed to just do what we want um but if you look at the nuclear industry right how does that work the regulator
says to the nuclear plant operator show me that your plant has a meantime to
failure of 10 million years or more and the operator has to give them a
full analysis with fault trees and as probabilistic uh calculations and the
regulator can push back and say you know I don’t agree with that Independence assumption you know these components
come from the same manufacturer so not independent and come back with a better analysis and so on uh at the moment
there is nothing like that in the AI industry there is no logical connection
between any of the evidence that people are providing and a claim that the system is actually going to be safe
right that argument is just missing um now the nuclear
industry probably spends more than 90% of its R&D Budget on
safety one way you can tell I I got this statistic from one of my Nuclear engineering colleagues that for the
typical nuclear plant in the US for every kilogram of nuclear plant there
are seven kilograms of regulatory paperwork I kid you
not so that tells you something about how much an emphasis that has been on
safety in that industry and also you know why is there to First
approximation no nuclear industry today is because of Chernobyl and because of a failure in
safety actually deliberately bypassing uh safety measures that they knew were
necessary in order to save money we’ll take one question from the audience provided it’s a quick question
I see a hand over there let me Dash [Music]
Down hi uh thanks very much for your talks here um my name is Charlie I’m a senior at UCL um one of the big reasons
I think why there’s so much regulation on nuclear power is widespread public
opinion and protests against nuclear power from within the environmental movement so I wondered whether you uh
thought if there’s a similar role for public pressure or protests uh for AI as
well thanks uh I think that’s a very important
question my sense is I I’m not really historian of the nuclear industry per se
uh obviously nuclear physicists thought about safety from the beginning uh in
fact so Leo Leo zard was the one who invented the basic idea of the nuclear Chain Reaction uh and he instantly
thought about a physical mechanism that could keep the reaction from going super critical and becoming a bomb right so he
thought about this you know negative feedback control system uh with moderators that would somehow keep the
reaction subcritical um people in AI are not at
that stage right or they just have their eyes on you know we can generate energy and
they’re not even thinking you know is that energy going to be in the form of a bomb or electricity right they haven’t
got to that stage yet so we are very much at the preliminary stage I do worry
that AI should not be
politicized and at the moment there’s a precarious bipartisan agreement in the
US and to some extent in Europe I worry about that breaking down in the UK I
think it’s really important that the political message be very straightforward you can be on the side
of humans or you can be on the side of our AI
overlords which do you want to be on um and so let’s try to keep it a a unified
message around uh developing technology
in a way that’s safe and beneficial for humans um we raise aw but we shouldn’t
do it in a partisan way yes and and what I I I I totally sympathize with the idea
that people have a right to be very upset that you know that multi-billionaires are are playing uh
you know playing poker with the future of the human race um it’s entirely
reasonable but what I worry is exactly that uh certain types of of protest
end up getting aligned in in a way that’s unhealthy uh it sort of becomes
anti-technology and we can look back at what happened with with uh GM uh
organisms for example uh which which most scientists think didn’t go the way
uh it should have and we we lost benefits uh without gaining any safety lot to think about there thank
you very much Professor Stuart Russell thank you we may give you the microphone again a