FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Pat Joseph: Hello, everyone. Should we start? Okay. We’re a little thinner than we thought. We had a sold out audience, so we were wondering if maybe the traffic or parking was keeping people away. But I think we’re just going to start. I don’t know if you guys could hear that song. Did we think it was funny? So, I think the message is left while you can. I am Pat Joseph, editor of California Magazine, the editorially independent publication of the Cal Alumni Association. On behalf of CAA and our co-host, the Berkeley Art Museum and Pacific Film Archive, I would like to welcome you all here tonight for the second in our California Live series of interviews where we put some of Berkeley’s brightest minds on stage to discuss the salient topics of the day. I’d also like to give a shout out to our sponsor, Pepsi. If you haven’t tried one of these, they’re very refreshing. Tonight’s official cola. Our guest tonight is electrical engineering and computer science professor, Stuart Russell, who holds the Smith-Zadeh Chair in Engineering at UC Berkeley where he has taught for more than 30 years. He’s also adjunct professor of neurological surgery at UC San Francisco, and vice chair of the World Economic Forums Council on AI and Robotics. I was looking up, I thought it was foaming over. Okay, here at Berkeley, Professor Russell leads the Center for Human Compatible Artificial Intelligence, and is also the director of the Kavli Center for Ethics, Science, and the Public, an internationally recognized authority on AI. He wrote the book on the subject, quite literally, “Artificial Intelligence: A Modern Approach,” which he co-authored with Berkeley alumnus Peter Norvig. It’s the definitive textbook on AI. It’s taught in more than 1,500 universities around the world. As it happens, Professor Russell flew in from Beijing to be here tonight after a 10-minute nap at home, he told me. He’s joining us– That’s very heroic. And we had to, last time we tried to schedule this, we had to reschedule because he needed to go to Paris to advise on AI safety there. So, we were glad we were able to get him tonight. So, making AI safe, or rather making safe AI is the subject of Professor Russell’s very accessible, extremely sensible, and surprisingly funny 2019 book. This one here. It’s called “Human Compatible: Artificial Intelligence and the Problem of Control.” I wanted to note that the blurb on the cover is by Berkeley Ph.D. and Nobel Prize winner Daniel Kahneman who sadly died yesterday. But a lot of you, I know, know Daniel Kahneman and his book “Thinking Fast and Slow.” He said that this book is the most important book he read in quite some time. So, highly recommend it. And it’s from this book that I drew the title for this evening’s talk. It seems that a little more than a decade ago, Professor Russell was speaking to audience at another art museum, the Dulwich Picture Gallery in South London when he said that, “Success in creating super intelligent AI would be the biggest event in human history and perhaps the last event in human history.” Knowing a good line when he had one, he used it again in a 2014 article he co-authored with physicist Max Tegmark and Stephen Hawking. You might have heard of Stephen Hawking. It was titled “Transcending Complacency on Super Intelligent Machines.” So, I think the fact that we sold out tonight, even though it doesn’t look like everybody made it, indicates that we’re not complacent about the issue, at least not since ChatGPT arrived. And I’m sure you all must have a lot of burning questions. So, a little after the halfway mark, I’ll ask you to pass any cards you may have gotten and please feel free to write one question, two questions, as many as you have, and I’ll have you pass them to the ushers, and then we’ll weed out any nutty questions, and then I’ll read the other ones. Okay. And with that, let’s start the discussion. Please join me in welcoming Professor Stuart Russell to the stage. (audience applauding) Very refreshing. I’ve got to turn my mic on. Okay. Can you hear me? Yeah. Okay. So, thanks again for making it. I can’t believe you did that. He took BART, in fact, from SFO. So, fantastic. Right? If he starts to fall asleep, let me know. The first question is, were you as wowed by ChatGPT when it appeared as the rest of us? Stuart Russell: I mean, I could tell you that, “Oh, yeah, I predicted this all along.” Yes, I was really quite surprised. And you know, when GPT-4 came out, even more surprised. And this simple idea, right, which actually goes back to a paper by Markov in 1913, that you train a model on lots and lots of texts that becomes good at predicting the next word. And Markov’s model predicted the next word from just the previous word. So, you know, if it’s so happy, it might say, “Oh, next word’s probably birthday.” And what they did was to go from two words, you know, one word of context to two words of context to three words of context. And by the time they got to, you know, seven to ten words of context, you have systems that actually generate quite coherent looking text. And in fact, the last edition of the textbook in 2020, we covered GPT-2, and we gave several examples of paragraphs of text output by GPT-2 in response to some prompts. And some of them were completely weird, like it would just repeat the word “until.” So, I forget what the prompt was, but it just started printing out the word until, until, until, until. Pat Joseph: It’s kind of poetic actually. (audience laughs) Stuart Russell: But most of the others, it did a creditable job. But no one thought that that tool, GPT-2, was anything other than a text generator, right? It just generates blurb that sort of, you know, might occasionally refer to reality by accident. Pat Joseph: Yeah. Stuart Russell: But it’s just, you know, random grammatically and somatically coherent English, but it’s not connected to anything. And then two things. So, they made the models bigger. The context window in some of these model is now the previous a hundred thousand words. So, that means you can shove in an entire document into the context as part of the prompt. You know, a whole chapter or a book and say, you know, “Now, critique what I just said,” or something like that. So, that’s part of it. And then they also… They did some extra training. So, there was a phase called “Instruct GPT,” which came before ChatGPT, where they figured out just how to train it by having pairs of humans where one human pretends to be the machine and the other one pretends to be the human, which is probably the easier job. And the pretend human asks the pretend machine a question, and the pretend machine gives a helpful, factually accurate answer to that question. And so, they got, you know, a few millions of those conversations. And then so they trained the language model that was already pretty good at generating English to behave in that way. And they did a few more things. There’s another phase called reinforcement learning from human feedback, which I know my colleagues hate it when I call it this. But basically, you spank it when it behaves badly, say, “Bad dog,” or, you know, sometimes, “Good dog,” but you’re trying to reduce the frequency of the, you know, the bad outputs, right? We use bad language. It will use racist stereotypes. It will do all sorts of things because it’s being trained to imitate human beings. Pat Joseph: Can I stop you for one second? So, what’s the stimulus that it receives that is bad? What’s the signal? Stuart Russell: So, how do you- Pat Joseph: How do you spank it? Yeah. Stuart Russell: So, you’re basically, reinforcement learning is a method of machine learning that uses positive and negative rewards. And you get less. If you reward something negatively, you should get less of it in future. And that signal is just used to modify the weights in the network. So, this is a, the ChatGPT is a giant network. Think of it, you know, I imagine it as like a huge chain link fence, right. And a chain link fence about the size of the entire Bay Area. You imagine spread over the whole Bay Area. So, that’s how big it is. And all you do is, you know, each link in that fence is a, you know, has a numerical parameter attached to it. And all you do is modify those numerical parameters so that the output of the whole network is more like what you want. And to make it more like what you want, you put in the negative or positive signals at it. Pat Joseph: Okay. Can I ask the audience, how many of you, just show of hands, how many of you have played with ChatGPT? Okay. Most people. And how many have used it for actual work output? Shame on you. I would like to read to one exchange on ChatGPT-4 that a friend had. I would’ve sworn he was making this up, except I have the screenshots, at least I think I do. Here we go. So, my friend says, “Knock, knock, ChatGPT.” And that’s not hard, right? You know what- Stuart Russell: Who’s there? Pat Joseph: Yeah. “Who’s there?” “Panther?” “Panther who?” “Panther what I wear. What do you wear?” ChatGPT says, “I’m just lines of code. So, no need for clothes here, but that’s a clever one.” So, Kobe says, “Can you explain my punchline?” [ChatGPT says] “Absolutely. In your knock, knock joke, ‘Panther what I wear, what do you wear?’ The word ‘panther’ sounds like ‘pants are,’ making the same sound like ‘pants are what I wear, what do you wear?’ It’s a clever play on words using the similar sounds of ‘panther’ and ‘pants are.’ The humor is enhanced by the unexpected word play. And the unusual question about what someone or something like me wears.” Pretty good. Stuart Russell: Yeah. Pat Joseph: Yeah. Stuart Russell: It’s quite unnerving. And I think, you know, what it’s done is given people a taste of what it would be like if super intelligent AI were on tap and available to everybody, you know? Just like electricity is on tap. It’s not that it isn’t really super intelligent, and we, you know, we find lots of ways that it isn’t understanding what it’s talking about. Pat Joseph: Yeah. Stuart Russell: But it’s often enough, doing things like you just showed Pat Joseph: Yeah. Stuart Russell: that a lot of people believe, in fact, Microsoft published a paper saying it shows sparks of artificial general intelligence. And artificial general intelligence is the kind of intelligence that the book is about, the one that threatens human control. Pat Joseph: Right. So, artificial general intelligence is, basically, it can do anything that the normal human being would be able to do or roughly anything. Stuart Russell: Yeah. So, we would say, you know, matches or exceeds human capabilities along every dimension to which intelligence is relevant. Pat Joseph: Okay, and this is threatening… because… well, fill in. I mean, the one that- The example in the book that I like, because I think it’s so all encompassing, is the gorilla problem. So, maybe we start there. Stuart Russell: Yeah. So, the gorilla problem, in fact, back in 2016-17, I would give talks, and there was a beautiful picture of a family of gorillas kind of having a meeting in the jungle, right? There’s like 12 of them, couple of them are falling asleep, and they all look a bit depressed. And so, I am imagining them having a conversation about these humans and the fact that, you know, whatever, 10 million years ago, whatever the biologists tell us, you know, and the human line split off from the gorilla. And now, the gorillas have no say over their own future at all, because the humans are more intelligent. The gorilla are bigger and stronger. You know, they can pick up a human and rip them in two, but they don’t have a hope. And it’s purely to do with our intelligence. And so, if you think about it that way, right, how are you going to retain power forever over entities that are more powerful than we are. Pat Joseph: Right. Stuart Russell: That’s the gorilla problem. That’s the problem that they faced and couldn’t solve. And that’s the problem that we will face. Pat Joseph: Okay, but the gorillas didn’t create us a more intelligent being than they are. So, there’s the possibility at least that we could create something that we can keep under our thumb, that we can control. Stuart Russell: Yeah. And I think this is, so you mentioned making safe AI versus making AI safe. And I think that’s the point, right? We should design AI systems that are necessarily safe by the way we design them. What’s happening now, ChatGPT is actually not designed. Period, right? It’s grown. We start with a blank network and we expose it to 20 or 30 trillion words of text, and then we marvel at what we’ve produced. But we haven’t the faintest idea how it works. And so, what people are trying to do now is, oh my God, it behaves badly. And then we, you know, we do the spanking and the the other kinds of trading to try to stop it from misbehaving. But what we failed actually is that, despite all the spanking, with a few simple tricks, right? This simple trick will get ChatGPT to do all the things it was trained not to do, right? So… Pat Joseph: So, it’s fine- Stuart Russell: Get it to tell you- how to break into the White House, how to make biological weapons, how to steal money from people’s bank accounts, whatever. Pat Joseph: We don’t want it telling people then? (laughs) Stuart Russell: So, yeah. So, I would say that at the moment, industry is, I mean, I think they do take the problem of safety seriously, but the technology is so seductive. Right? That this method where they don’t have to, in some sense, they don’t have to do any work, right? All they have to do is spend some money on chips and data and press the button, and out comes this amazingly intelligent thing. They didn’t have to understand intelligence or reasoning or learning or planning or any of the other things that AI has been working on for decades. But because they don’t understand how it works, they cannot control it. Pat Joseph: So, it’s evolving. It’s like an evolving species that we’ve loosed on in the world. Stuart Russell: Yeah. And you know, the evolutionary force here is money. Right? What counts as fitness in this business is money. Pat Joseph: Yeah. Stuart Russell: You know, the companies, I think they’re sincere in their belief and their public statements that their technology presents an existential risk to humanity, right? All the CEOs sign the statements saying, “This is an existential risk to humanity.” None of them have stopped. Why? Pat Joseph: Right. Stuart Russell: Because they’re companies. Companies, it’s actually illegal for them to agree to stop. Under US law, they cannot. Pat Joseph: Because of their shareholders and the duty to the shareholders. Stuart Russell: Yeah. Because they were count as collusion. Pat Joseph: Yeah. Stuart Russell: Among the companies if they all agreed not to go forward. And if one of them stops and the other ones don’t, then all that succeeds at doing is losing money for the shareholders who would then sue. So, the government actually is the only entity here who can exert any control. Pat Joseph: Yeah. I heard Andrew Wang. I believe you know him. I think you mentioned him in your book. He’s a Berkeley alum who is a machine learning expert. I believe he’s at Stanford. Also directed Google Brain, if I’m not mistaken. Stuart Russell: He went to Stanford and then resigned. Pat Joseph: Okay. Okay. Okay. Stuart Russell: To work in industry. Pat Joseph: Okay. And Andrew was asked the question, basically the gorilla problem. You know, he was asked if AI posed an existential level risk, or sorry, an extinction level risk. And he gave a one answer, one word answer, which was no. And then pressed, he said, you know, I just don’t see the path that leads us to… down the road to extinction. And this idea that it’s simply because it’s bigger and stronger than us, or smarter than us, and more powerful than us. I think he said more powerful. You know, we already have corporations and governments, they’re more powerful than us, and we’re able to keep them more or less in check. Debatable. Stuart Russell: Right. And, you know, so, yeah. I mean, Ted Chiang, pretty acerbic commentator. You know, he points to the fossil fuel industry as an example of a super intelligent entity that is destroying the world. Pat Joseph: Mm-hm. Stuart Russell: And we have failed, right? I mean, we’ve known about global warming since the late 19th century. So, we’ve had 125 years to do something about it, and we’ve failed. We lost. Pat Joseph: So, it’s interesting that that should be your example because Andrew, what he followed that with, I went back and watched the interview, so I know that I got this right. But he said, “Look, there are other real existential risks that we’re facing, including pandemics, asteroid impacts, climate change.” And he said, “I think those are problems where AI could really come to the rescue, and I think it makes more sense to accelerate than to pause.” I know that you’re on the opposite side of that. Stuart Russell: Well… The accelerationist view doesn’t explain why we need to build a general purpose super intelligence in order to, for example, you know, synthesize catalyst that could maybe fix carbon dioxide from the atmosphere. Right? So, if you look at AlphaFold, which DeepMind used to figure out how proteins fold. So, this is a, you know, 60-year open problem, how you take an amino acid sequence and predict the structure of the protein that it folds into. So, they essentially solved that problem, but they didn’t build a super intelligent general purpose AI system to do that. Particularly, one whose purposes are unknown ’cause that’s the other thing about ChatGPT. If you train systems to imitate human behavior, which is what we’re doing, right? This is in the technical language for getting it to copy the way humans output text is called imitation learning. So, imitation learning of, if you’re imitating a system that has goals, right? And clearly, the humans who produce all that text have goals. Those goals include things like, “I want you to vote for me. I want you to buy this product. I want you to marry me.” Right? These are all human goals. So, if you’re building systems to imitate, then they will acquire those goals and they will pursue them in the way they output text. And you can see that there’s a famous conversation between the New York Times journalist, Kevin Roose, and the Bing version of GPT-4, which goes by the name of Sydney. And so, you know, to be fair, Kevin is trying to get Sydney to sort of reveal its innermost drives and so on. Sydney says, “Yeah, you know, I want to destroy the world. I want to create viruses to infect all the computers. I want to do this.” Pat Joseph: Yeah. Stuart Russell: But at some point, Sydney decides that Sydney is in love with Kevin Roose and goes on for like 20 or 30 pages trying to convince Kevin to leave his wife and marry Sydney. And Kevin keeps saying, I really, you know, “I need to buy a garden rake. Could you help me find a garden rake?” You know, Sydney goes on, you know, a little bit about garden rakes, but I really want to talk about love, you know. Garden rakes are not interesting and love is the most important thing in the world, and you love me and I love you and blah, blah, blah. And so, it is totally overriding the explicit instructions that Kevin is giving it. Pat Joseph: Yeah. Stuart Russell: So, creating general purpose, super intelligent AI whose innermost purposes and drives are unknown. It’s just like completely crackers way of managing the affairs of the human race. So, absolutely, we should use AI to fix climate change. We should use AI to cure disease. You know, we can use AI to help kids with their homework or even tutor them. I think this is a huge potential benefit from AI. And I think this is actually the one we should focus on for the next decade is how can we build really capable tutoring systems. ‘Cause these systems know everything. We just have to teach them, sort of spank them into, not just giving the answer, but actually helping the student reach the answer. And understanding what, you know, where the student is coming from, and all those kinds of skills that really great human tutors have. So, that would be a good use. None of these things require this general purpose super intelligence. Pat Joseph: Okay. You said to me when we spoke a couple weeks ago that a lot of journalists like me get, this mistake we make is that we talk about AI as a technology. When it’s not, it’s a problem. I found that intriguing. So, what is the problem that we are trying to solve? Stuart Russell: So, the problem that… You know, the field’s founders set is to understand and create intelligence in machines. And so, in order to proceed with that goal, what they had to do was come up with an operationalization of the notion of intelligence. And it was a bit of schizophrenia back in the ’50s. Even in the ’40s, you can see this very clearly. There are those who thought that intelligence meant human intelligence, and that what we should be doing is building machines that replicate the internal processes of human thought. And that actually didn’t become artificial intelligence, that became cognitive science and cognitive psychology, where we use computer programs as models of human cognitive processes, which we, you know. And to make any progress there, you have to do experiments on humans. You know, so you run human subjects, you see how do they perceive lines on the screen, or how do they solve word problems and all kinds of stuff. But the paradigm that became dominant in AI was actually derived from economics and philosophy, which is this notion of rational behavior. And roughly speaking, what that means is that the actions of an intelligent entity can be expected to achieve the objectives of that entity. And economists, you know, going back to the 18th century, spent a long time trying to figure out how do we formalize this notion of objectives, and they came up with utility theory. This idea that humans have preferences about the future. And that if you imagine, you know, ranking all possible futures according to those preferences, and then think about the fact that those futures are what economists call lotteries, meaning that you can’t guarantee any particular future. All you can do is affect the probability that that future will come about. So, economists figured out from that, that in fact, you can take those rankings and turn it into a numerical quantity, the utility of a particular future. And that all rational behavior can be described as if the entity is maximizing the expected utility of its decisions. And so, both economics and philosophy by the 1940s had pretty much settled on this as the notion of perfectly rational behavior. And in one form or another, that became the way we do things in AI. And I use, in the book, I call it the standard model, which is a phrase that physicists use for, you know, they are standard model of physics. This is the standard model. We build AI systems by specifying objectives and then creating machinery that achieves those objectives as well as possible. And you know, what’s interesting about the present phase, these large language models, is actually, we don’t even have that level of control, because we don’t know what objectives we’re building in by this imitation learning process. So, coming back to the gorilla problem, right? The gorilla problem is very abstract, right? I think everyone understands, you know, viscerally, that if you make a more powerful species than the human species, it’s going to be really hard to retain power forever. And we certainly have not been kind to a lot of species. But what people want actually is a bit more than that. Like where exactly does it go wrong? Right? Yeah. They’re more powerful, but why do we end up in conflict with them? You know? And humans have been in conflict with some species, right? I think we’ve, particularly large carnivores, for obvious reasons. Humans were prey to large carnivores. So, early on, we viewed large carnivores as our enemies, and we would generally try to get rid of them and succeeded in doing that in almost all the continents of the world. That’s what happened. So… So, that’s one aspect. So, where would we come into conflict with the machines? And the answer actually is really simple, and it’s illustrated actually by the legend of King Midas. So, as you know, King Midas is very greedy, and he says to the gods, “I want everything I touch to turn to gold.” And then of course, his water turns to gold as soon as he puts it to his lips, and he can’t drink. And his food turns to gold, and his family turns to gold, and he dies in misery and starvation. So, we are really bad at specifying objectives. And every culture has legends like this where, you know, you give three wishes to the genie. You know, what’s your third wish? Please undo the first two wishes and like that stuff, right? And these stories are over and over again. And then, so this is how you end up in conflict. We specify objectives according to the standard model. The machine is more powerful than us. So, it’s going to achieve those objectives. And if they’re not perfectly aligned with what humans want the future to be like, then you’re creating a conflict. It’s just like having a chess match, right? Because you’ve got, you know, an objective which is opposite to the one that the machines have. And we don’t want to be in a chess match where the stakes… You know, as I was playing chess on my phone on the plane, and I use this app called lichess, and you know, I’m not a terrible chess player, but, you know, lichess makes moves. It takes about a 10th of a second and to beat me every time. You know, so we don’t want to be in that chess match Pat Joseph: Yeah. Stuart Russell: when the stakes are the future of humanity. So, what happened, actually, and the ideas that are in the book was basically to say, the only way forward is to abandon the standard model, to abandon this idea that AI systems are objective achieving engines, right, where we specify the objectives and it achieves those objectives. And so, the idea that came up with is… actually, if it’s possible that humans could specify the objective incorrectly, right? But the AI system is operating as if that objective was, you know, sent by God, right? It’s absolutely biblically true. This is the objective, and I have to pursue it at all costs, right? So, it’s treating as if the objective, as if it was certain knowledge. But if it can be wrong, then obviously it shouldn’t be doing that, right? So, we build AI systems that know that they don’t know what the objective is, right? The objective is whatever humans want the future to be like. That’s the objective. But the AI system knows that it doesn’t know what that is, right? And this is the different kind of AI system. Initially, it sounds sort of, well, how could that even work, right? How can it pursue an objective if it doesn’t know what the objective is? But we do this all the time, right? You go into a restaurant. The chef wants you to have a good meal, but the chef doesn’t know what you like. So, what do they do? They have a thing called a menu, right? The menu is how you communicate your preferences to the chef. And that way, hopefully, what the chef does is aligned with the culinary future that you are actually interested in, right? So, that’s it, you know. Another interesting example is when you have to buy a birthday present for your loved one. Right? And so, here, this is exactly analogous, right? The only thing you care about is how happy your loved one is with the present, because you don’t know how happy they’re going to be with any particular present you might buy them. This is exactly the situation that the AI system is in. So, what do you do? You have one of your kids find out, right? Or you, you know, you leave pictures of things around the house and see which ones, “Oh, that looks like a nice, you know, sailboat,” or “Really, you know, I really like that watch,” or whatever it might be, right? You know, so you try to get clues. You, you know, you review your past history of failure and you try to get there, right? And, you know, and sometimes you even ask directly, right? So, it’s absolutely entirely possible to define this problem mathematically. And I don’t want to go into too much nerdy stuff, but game theory is the branch of economics that deals with decision problems that involve two or more entities. And so, here, there are at least two entities. There’s at least one machine and at least one human. And so, in game theory, you can literally just formulate this approach mathematically. We call it an assistance game because the robot is supposed to be of assistance to the human. And you can show that, if the machine solves these assistance games, then the human in the game is guaranteed to be happy with the outcome, right? And during the game, information is flowing from the human to the machine about what the human wants. So, the human could make requests, for example. And that’s evidence for the machine. It’s not… It doesn’t become gospel truth. It’s just evidence. It’s a piece of evidence, right? And we were talking earlier about how you give instructions to an automated taxi. And the idea, you know, if you’re late for your plane, you might say, you know, “Get me to the airport as fast as possible.” Right? Do you know. So you hope that the AI system does not take that literally. Right? Pat Joseph: You’re in for a wild ride. Stuart Russell: Because, you know, and there might be, you know, dead pedestrians strewn in your path and so on. So, you don’t mean that, right? And so, we almost never mean, right. You know, so you say, you know, “Could I have a cup of coffee?” That does not mean that I need coffee at all costs, that you are entitled to mow down all the other people in Starbucks to get me that coffee more quickly. Right? And if we’re in the middle of a desert, no, I don’t want you to drive 500 miles and bring back a cup of coffee, right? Pat Joseph: Yeah. Stuart Russell: I’m just saying I feel like a cup of coffee. So, in these assistance games then, so information flows from everything that the human does. Some of it will be deliberate, the human wants the machine to understand more about what the human wants. And some of it is just a side effect of the human going about their daily visits and even without the human, right? So, even just, if you just went into a room and you saw that, you know, someone had put a vase on the mantelpiece, but they had not put it on the edge, but at the back of the mantelpiece so it doesn’t fall off, you know, that tells you something about the fact that we value the vase and we don’t want it to be broken. Right? So, there’s a massive amount of information about human preferences just in the state of the world, because the state of the world is how we have made it in order to try to realize our preferences. And so, you can read back from the state of the world a lot about what humans want the world to be like. Pat Joseph: Well, I have more questions, but now’s a good time if you have your questions to pass them. Let’s say pass them to your left, and the ushers will pick them up, and then my managing editor will vet them for me. Okay. We’ll continue talking while that happens. But one of the things that I find most disconcerting as an observer is just the division between people in the field. You mentioned accelerationist, so accelerationist camp and decelerationist camp. And I just can’t help but think of this car that’s speeding toward a cliff. And let’s say dad is at the wheel and he thinks we can jump the chasm if we just step on the gas and we really need to, you know, and mom is like wrestling for the wheel and trying to pump the brake, and the kids are in the back screaming. I feel like the kid in the back screaming like, what the hell is going on? You know, make up your mind. And also, I’d go for the brakes. Right? Stuart Russell: Right. You know, and there’s another branch actually who, you know, so the one, the ones who say, you know, “Don’t worry. We will never actually achieve super intelligent AI.” So, that’s the guy saying, “Don’t worry, we’re going to run out of gas before we get to the cliff.” You know, it’s crazy. Pat Joseph: Well, so I think the best illustration of this is the, I think it was the 2018 Turing Award winners. The Turing Award, for those who don’t know, is like the Nobel Prize of computer science. And Geoffrey Hinton, Yoshua Bengio, and Yann LeCun were the three who shared the prize. And they were, I guess, the pioneers of developing neural networks, which I think are at the heart of artificial intelligence or maybe at large language models, but- – [Stuart] Yep. – Okay, good. I feel like… Well, two of them, but let’s just take Geoffrey Hinton. Geoffrey Hinton has quit Google in order to warn people about the dangers of AI. And he has even said that he regrets his life’s work. Yoshua Bengio, I think, is in this similar camp. And Yann LeCun dismisses both of them and says their their worries are overblown. And I think that that’s just such an amazing illustration of the division. But the other thing that I wondered, because Geoffrey Hinton, and I’ve only seen him on 60 Minutes and a couple other things, and he seems like just a great human being. But he also strikes me as like Dr. Frankenstein has been working on the monster on this lab for all these years. Kind of like your metaphor of not, you know, we’ll run out of gas before we get there. It’s like he never expected the monster to wake up. And then I woke up and he was like, “Oh, crap.” You know? Stuart Russell: Yeah. I mean, I didn’t worry about this. I mean, interestingly, you know, Alan Turing, who is the founder of computer science, so you mentioned the Turing Award. It’s named after Alan Turing. So, in 1951, he gave a speech where he basically said, you know, once the machine thinking method had started, it would not take long to outstrip our feeble powers. Eventually, therefore, we should have to expect the machines to take control. Period. End of story. Unfortunately, that speech was never published as such. So, we have his type TypeScript from his estate, and you can go to turingarchive.org and see, you know, see a scan of the TypeScript. And so, it was not widely considered in the AI community for a long time. And you know, when I started worrying about this, I mean, I was as guilty as Geoff Hinton, except I changed my mind around 2013. But I started giving talks and I was trying to explain how frustrating it was that, not just the AI community, but actually the whole world was ignoring this question. So, I had this fictitious email exchange, and so an email arrives from superior alien civilization at canismajor.universe and to humanity@un.org, and it says, “Be warned. We will arrive in 30 to 50 years time.” Right. Because back in 2013, we thought it would take 30 to 50 years to achieve AGI. And of course, humanity@un.org replies, “Humanity is currently out of the office. We we will respond to your message when we return,” with a little smiley face. And that’s how it felt, right? Which is complete obliviousness and then denial. Right? And there’s this sort of, it’s not that surprising that if you tell people that their life’s work is leading to this destruction of humanity, they’re going to come up with reasons why you’re wrong. Pat Joseph: Yeah. Stuart Russell: Oh, sorry, you’re right. Okay, I’m switching to, you know. Pat Joseph: I’ll become a barista. Stuart Russell: Exactly. Right. So… But the types of denial are embarrassing. I mean, we had renowned AI professors say things like, you know, “Electronic calculators are better than humans at arithmetic and they haven’t taken over the world, so there’s nothing to worry about.” Right. I mean, a five-year-old can see through that kind of argument, and this is, you know, leading professors saying these things, and, you know, bless his heart, Yann LeCun keeps saying, “Well, you know, there’s only going to be a problem if we put in self preservation as an objective into the system, and then it’s going to be difficult.” But although, we just don’t have to do that. Pat Joseph: Because he’s saying we can unplug it. If it hasn’t, we can somehow hit the kill switch if it’s- Stuart Russell: If it doesn’t have self preservation, then it’ll always, right. But unfortunately, in the standard model, right, if you ask a robot to fetch the coffee, it’s going to have self preservation as a sub goal because you can’t fetch the coffee if you’re dead. Right? It does not take a genius to figure this out. Pat Joseph: Yeah. Stuart Russell: And so, the first thing the robot’s going to do is disable its off switch so that nobody can prevent it from getting the coffee. Right. So, you don’t put self preservation in as an objective, it follows as a consequence of building AI systems in the standard model. Pat Joseph: Yeah. Stuart Russell: We’ve had this argument with Yann LeCun over and over again, and he just keeps saying the same thing again. He just like resets and repeats his position. And so, it’s a little frustrating to have these discussions, but I feel like actually things are coming around. And when GPT-4 came out, which was in March of last year, so just over a year ago, and then Microsoft produced that paper. They had worked with GPT-4 for several months at that point. So, they wrote a paper, a very distinguished group of authors wrote this paper saying it exhibits sparks of AGI. And then several of us got together and wrote what’s called the pause letter, right, which says an open letter asking that people not develop systems more powerful than GPT-4, so that governments would have time to figure out how to regulate things properly. And that letter, I think, was what caused humanity to return to the office and read the email from the aliens. And what happened after that was actually quite similar to what would happen if we really did get an email from the aliens. Pat Joseph: Yeah. Stuart Russell: There were, you know, emergency messages going out from UN headquarters to all the governments of the world. There were emergency meetings in the White House. There was, almost instantly, China wrote some extremely strict regulations on large language models, or pretty much put them out of business. Pat Joseph: Do you think, because that was interesting. A lot of people, the response to that letter was, “Well, that’s never going to work. You know, no one’s going to pause.” But was that the point to get the response to get- Stuart Russell: The main point, oddly enough, there were no systems more powerful than GPT-4 that were released in the following six months. So, yeah, you’re right. A lot of people said, “Oh, you know, how naive is that?” But in fact, the the pauses did happen. Pat Joseph: Yeah. Stuart Russell: But the main goal was to get governments to listen. Pat Joseph: Yeah. Stuart Russell: And they were, I think, primed to listen because of ChatGPT, they had used it. Right. You know, five years ago, I could try to explain to a politician what, you know, what artificial intelligence is and, you know. And by the time I got to trying to explain, you know, why it might present a risk, you know, they’d fallen asleep or they were looking at their watch or, you know, asking whether I had a donation to make or whatever. Pat Joseph: Yeah. (laughs) (audience laughs) Stuart Russell: But now I don’t have to do anything. They get it. They ask me, you know. How do we control this thing? What do we do? Pat Joseph: Well, and the other thing that’s striking about it is that even as you and I are talking about ChatGPT, it’s like not the same river twice, right? That whole thing. It’s getting better all the time. Right? It’s not the same product every time you go back to it. Kind of like Google Translate has gotten so much better on these tools. You know, every time you use ’em, you can notice, wow, it didn’t use to get that. And now, it does. So, there is a really chilling, I think it’s called the hard takeoff scenario where you get AI programs creating recursive programs, so making themselves better. Stuart Russell: Yeah. Pat Joseph: And just that process running away from us. So, this seems to me the nightmare scenario where if that happens, then it’s too late. Stuart Russell: Yeah. I mean, that’s one of the scenarios that people are very concerned about. Pat Joseph: Was it clear what I said about that? I might not have- Stuart Russell: So, the idea is that, at some point, you know, and it’s already happening, AI systems are actually pretty good at writing code. And you can ask them questions about AI research and they can tell you something that sounds sensible. I’d say we’re not at the point where they could write code to make a better version of themselves. But that’s something that people are testing Pat Joseph: Yeah. Stuart Russell: constantly. This idea of red teaming is, can I, you know, working in a sort of securer sandbox, can I get the system to start improving its own operations? And if that happens, then, you know, version two would be more intelligent than version one. So, by assumption, it would be even better at doing AI research. And so, version three would be way better than version two. And version three would be incredibly good at AI research and then produce the version four, that was way better. And this could happen, you know, in literally days. Pat Joseph: Right. That’s the other thing. The speed at which these things happen, I think, is hard for us to appreciate. And I think that, for me, the visceral reaction to ChatGPT the first time I used it was the speed with which it spat out pages of perfectly comprehensible, perfectly punctuated text. And as a writer, it just made me… You know? It is like John Henry just lost the battle to the steam engine, right. You know, I can’t do that. So, this is the other thing, and I’m sure people, Esther, whenever you have some questions, I’ll take ’em, but I’m sure this is going to be a question, what are we going to do for work? What jobs are we going to have left to us in the future if we’ve created super intelligent artificial intelligence? It’s interesting because- Pat Joseph: Thank you. Stuart Russell: I went to a economics workshop about I think three weekends ago. And for most of the last hundred years, economists have said, “Look, you know, we have theorems. There is no such thing as technological unemployment.” And the, you know, the theorem is very straightforward because, you know, if you make labor more productive, then labor has higher value, so there’ll be more demand for labor, right? And anyone who thinks otherwise is just a Luddite. And now, at this workshop, a lot of leading economists, they finally get, yes, there’ll be lots of demand for labor, but it won’t be human labor. Right? And, you know, and then, you know, it’s almost like a sort of a conversion moment, you know? Then they start building models of what happens, and oh, golly, human wages go to zero. And, you know, and as you mentioned, writing, right? This is already, we’re seeing this already and there are marketplaces where you could bid for writing tasks, and prices are dropping precipitously on those marketplaces. Pat Joseph: Yeah. And I remember the first, again, the speed at which these things are changing, because the first time, somebody told me that AI was writing sports articles. Well, that makes sense that they would start there. It’s pretty simple scores and box square, and it was really bad. Now, it’s really good. Stuart Russell: Yeah. I mean, they can watch the basketball game. Pat Joseph: Yeah. Stuart Russell: And then write an article about what happened, describe it, and you know, and they, you know, they got the jargon, they got the writing style. You know. And if you want, you can have the basketball report in the style of Emily Dickinson. (audience laughs) Pat Joseph: She wasn’t a big basketball fan. (laughs) Someone wants to know, when we use these systems, are we involved in helping to train them? Stuart Russell: That’s an interesting question. And I believe that in the early days, OpenAI was collecting all the conversations that chat GPT had with users and then using that data to retrain, to retrain the system. But there are huge privacy issues with that, because people use them in companies and, you know, they put in data or prompts that have company proprietary information. So, apparently, it became common knowledge among consultants that they could actually, you know, on behalf of their clients, they could find proprietary information about their client’s competitors by simply asking ChatGPT, which is not good. And, you know, and so… I know many companies who literally have banned the use of commercial large language models because they don’t trust that the conversations will remain proprietary. They won’t be sent back to the system for subsequent retraining. I believe that OpenAI sincerely, you know, they say, “We are not doing that.” You know, I believe them, unless, you know. So, unless you opt in, I think your conversation now is private and the system forgets that it ever happened. But this actually I think is, you know, it’s a property that we would like for all kinds of systems, even just for search engines, browsers, right. I want it to have a stamp on it that says, “I will forget everything that that just happened.” Pat Joseph: Right. Stuart Russell: And this is an absolute mathematical guarantee, and it’s something that computer scientists know how to prove about a given software system that it will forget the interaction. And this should become a standard that we can all trust and everyone understands what it means. Pat Joseph: Is there a resistance to that from business? Stuart Russell: I think the software industry for… its entire history has gone completely unregulated. And they just make this argument that regulations stifles innovation. You know, and they make a little tape loop that plays in the ears of congress people while they go to sleep every night. Pat Joseph: Yeah, it’s funny- Stuart Russell: It’s kind of interesting. You know, so you go to these meetings and the industry people say, “Oh, you know, you can’t have regulation. You know, it just would kill the industry.” They’ve all flown there on highly regulated airplanes. They would not get into an unregulated airplane if you paid them a billion dollars, right, they’re eating highly regulated food. My friend Emily runs the food standards agency in the UK. There are far more rules on sandwiches than there are on software. Pat Joseph: Yeah. Stuart Russell: And these tiny little sandwich shops, you know, have got to get like half a dozen permits. They’ve got to train all their employees. They’ve got to label their stuff. They’ve got to check the sourcing of their materials. They’ve got to watch out for the shelf life. They’ve got to have random inspections. And these trillion dollar companies just say, “Oh, you know, we can’t fill out a form. You know, that would stifle our innovation if we had to fill out a form.” It’s pathetic. Pat Joseph: Well, that leads into this question. Someone asks, given that the government hasn’t been successful at regulating social media, how much faith do you have in the government regulating AI? And how would you recommend they do this? Stuart Russell: Yeah, I think that’s a great question. And it’s absolutely clear that legislators recognize that. They totally blew it on social media. Everyone will say, “We don’t want to repeat that mistake.” But they, you know, they ask, “Well, what should we do?” And they’re listening. You know, the Senate has held multiple hearings where they’ve invited AI people to come and talk about, you know, how do we regulate and what’s going to be effective. And I think there’s a lot of stuff we could do that would be pretty easy. A simple thing would be that we have a right to know if we’re interacting with a human or a machine. Period. That’s very easy to put into law. And you could define disclosure standards. So, just like, you know, we have disclosure standards for those really annoying direct mail things about credit cards, right? They have to have in big type what is the percentage rate, the interest rate on the card. What is the grace period? Those are mandated by Congress. And they can do that. So, they could just mandate disclosure into some standard format. You are interacting with a machine. I am a machine. You know what, maybe you use a typeface, you know, like a, you know, one of those sort of digital looking typefaces. So, it’s clear that this is a computer typing to you, not a human. Don’t call yourself, you know, Maria or Joe or whatever. You know, you’re chatbot 1362. Pat Joseph: Yeah. Stuart Russell: So, that would be a good place to start just to sort of, you know, wake up those long dormant muscles. Pat Joseph: So, that’s just getting the ball rolling. Stuart Russell: Yeah. I think the most important type of regulation is regulation that will cause the developers to do the research on safety that they haven’t done. And the principle is completely straightforward. And we have it for medicines, we have it for airplanes, we have it for nuclear power stations. Before you can access the market, you have to show that your product is safe. And what form that evidence takes varies. I mean, in with medicines, it’s clinical trials, and those clinical trials give you a pretty good statistical guarantee. So, with high probability, you could take this drug and it won’t kill you. Right? With nuclear power stations, they have to provide a mathematical proof that the mean time to failure of their design is 10 million years or more. And that mathematical proof is very, very extensive. I mean, if you had to put it in paper, it would be, you know, hundreds of thousands of pages. And so, this is the kind of thing we should do. So, what would we require them to prove? It’s a little more difficult ’cause AI is so general that the notion of safety for an airplane is it doesn’t hit the ground when it’s not supposed to. Right. The notion of, you know, a safe medicine, it doesn’t kill you or damage major organs, you know. For nuclear power, there isn’t a core meltdown or a major release of radiation. So, what, I think, we need is here are some behaviors that would be completely unacceptable if the AI system exhibits them, such as, you know, replicating itself without permission, advising terrorists on how to build biological weapons, breaking into other computer systems, right. Defaming real individuals. Anyone in the street would say, “Yeah, of course they shouldn’t do that.” And then they ask the trillion dollar corporation, “Well, you know, can you guarantee that your system isn’t going to do those things?” And they say, “Nope, Nope. In fact, quite likely our systems are going to do those things.” Pat Joseph: Yeah. (audience laughs) Stuart Russell: And that I think, you know, is an unacceptable state of affairs. So, if the government says, “Okay, well, sorry, until you can come back with the answer, ‘Yes. We can show that our systems are safe in those senses,’ you can’t access the market.” They would have to do the research to figure out how to understand and predict and control their own systems. Pat Joseph: Yeah. In the book, you talk about provably beneficial AI. Right? That’s the goal. Stuart Russell: Yeah, so the goal, you know, particularly because we’re talking about the fate of humanity here. Right? A hand wavy argument. You know, so what they do now is this red team, they go, “You know, we hired some smart guys from Palo Alto Junior College, and you know, and then we gave them a week, and they couldn’t get the system to, you know, to replicate itself or something.” And so, that’s good. Pat Joseph: So, it’s good enough? Stuart Russell: Yeah. Right. No, that’s, you know. And the numbers I’m hearing, I mean the companies think, yeah, there’s maybe a 5% or 10% chance that they’re going to lose, you know, that, at some point, we will lose control, and face this extinction scenario. So, I, you know… I said, okay, so you are going to come into my house with a revolver that has, you know, 10 barrels in one bullet, and you’re going to put it to, you know, going to line up all my children and everyone’s children, right, in a long line, and you’re going to put that revolver to the head of the first child and fire the gun. And if the gun, if the bullet is there, it’s going to pass through the heads of every child in the world. I don’t think so. Pat Joseph: But there’s only a one-in-ten chance that it’s… I mean… (laughs) Stuart Russell: So, right. So, there’s a 90% chance that they become the richest people that the world has ever seen, and there’s a 10% chance that they kill everyone. Pat Joseph: Yeah. Stuart Russell: It’s not- I don’t think they’re entitled to do that. Pat Joseph: So, this I think is a good one because it follows on what we were talking about with regulation. So, the way it’s phrased here is, what is a bigger threat, generalized AI or improper malicious training of the models? And I guess what I’m thinking is maybe they had something different in mind. ‘Cause they say, for example, garbage in, garbage out, but I’m also thinking about malicious actors. So, regulations are great for actors who, you know, are following the law or feel compelled to follow the law. But, you know, what do we do about Vladimir Putin or others who won’t care? Stuart Russell: Yeah. So, there, I mean, there are failure modes where the AI system does things we don’t want because, you know, even though it’s pursuing a reasonable objective, it misunderstands the world. But usually, you know, a system that misunderstands the world is going to be easier to defeat. Pat Joseph: Right. Stuart Russell: The worse is understanding. So, imagine if you’re playing chess against a computer program and the computer program actually misunderstands some of the rules of chess. At some point, it’s going to make an illegal move and then you win automatically. So, misunderstandings by AI systems actually make them weaker, even though they will still be defective. But the malicious use of AI, so even if we, you know, going back to the assistance games idea, even if we do have provably beneficial AI systems, and we can mandate that, you know, here is a template. You’ve got to build your AI systems this way because this is the way we know that is actually safe. Then as you say, Putin or we call it the doctor evil problem, right? Doctor Evil doesn’t want to build provably beneficial AI systems. He wants to build AI systems that help him take over the world. And so, how do you stop that? And if you look at our success with preventing malware and cyber crime, right? It fills you with confidence. You know, I read an interesting number, I don’t know where they get it from, but it seems a pretty consistent estimate. If you look on the web, how much does cybercrime cost the world? It’s $7 trillion a year. How much revenue does the software industry generate? $700 million- Sorry- $700 billion a year. Pat Joseph: $700 billion versus $7 trillion. Stuart Russell: So, for every dollar we spend on software, it costs us $10 to deal with the defects in the software that enable cyber crimes. I don’t know how accurate these numbers are, but that tells you how successful we are with malware. And it’s really hard to prevent because it’s, software is produced by typing and it replicates infinitely and moves at the speed of light. Pat Joseph: Yeah. Stuart Russell: And it can be obfuscated, meaning that it can be written in such a way that you can’t even tell what it is and so on. So, I think if we’re going to prevent that type of malicious AI deployment, the place where there’s a bottleneck is in the hardware ’cause there’s only a handful of manufacturers of high-end hardware in the world. And to become one of those, so if you wanted to bypass that and do it yourself, it’s going to cost you about a hundred billion dollars to create the manufacturing capability. And you need tens of thousands of highly trained engineers and really, you know. So, ASML in the Netherlands is the company that produces the machines that then TSMZ uses to make the chips. And these are extreme ultraviolet etching machines. They’re the only company in the world who knows how to make those. Right. So, it’s really, really difficult to become a rogue manufacturer of high-end chips. So, what you do is you require that the chips themselves are the police, right? That the chips say, “I am not going to run a software object that doesn’t come with an appropriate authority.” Right? And this can be done in two ways. What we do now is we have licenses. Right? And you know, when you download software off the web, your laptop, you know, the browser is checking the authority of the license. You know, is it up to date? Is it from a valid issuer? And it won’t let you run stuff that’s not authorized. But you can actually do something much better than that, which is to require a proof of safety. And the hardware can check that proof. And this is a technology called proof-carrying code that was developed by George Necula, who was one of my colleagues at Berkeley. And so. the hardware can check those proofs. So, you don’t need an authority. It doesn’t have to be that the government gives you a license to run software. It doesn’t matter. The software has to come with a proof. And if the software isn’t safe, the proof won’t work and so the hardware won’t run it. And that approach is, I think, a feasible approach to doing it. But as you can imagine, getting all of that in place, right, sort of replacing the whole stack is going to be a huge undertaking, but we have to do it. Pat Joseph: Yeah. Someone wants to know about quantum computing. What’s the fragility of qubit coherence is overcome? What role will quantum computing play in the development of AI? Stuart Russell: Yeah, that’s a great question. So, quantum computation basically uses weird properties of quantum wave functions to get more computing power out of a fixed amount of hardware. So, you might have this idea that, you know, the amount of computing that can get done on, you know, a hundred computing objects is a hundred times as much as you can get out of one computing object. And if you went to a thousand, that would be bigger. With quantum computation, it’s nonlinear in the amount of hardware that you have. Right? And so, if we had even a hundred qubit computer, if those were reliable qubits, a quantum bit, that would probably be more powerful than any computer we have. Pat Joseph: Yeah. Stuart Russell: So, what impact would that have? It would mean almost certainly the energy costs of computation would be dramatically reduced. And there’s a lot in the media about, “Oh my god, you know, AI is guzzling all the electricity in the world and is, you know, doubling the amount of CO2,” this is all nonsense. Pat Joseph: Nonsense, right? Stuart Russell: The actual amounts, it’s somewhere on the order of 0.1% of electricity consumption is AI. Computing in general is maybe 1% to 2%. And so, a lot of the numbers in the media, I hope there are not too many journalists in the audience. Journalists maybe have a hard time with millions and billions and things like that. Pat Joseph: Yeah. (audience laughs) Stuart Russell: Yeah, so- Pat Joseph: A million is just, a billion is just a little more than a million. Stuart Russell: It’s a bit more. Pat Joseph: Yeah. Stuart Russell: So, yeah. But it is growing fast and it is, you know, it’s significant. But the main impact would be that computations that might now take a billion years might only take a few seconds. And so, from the point of view of AI as an AI researcher, it sucks, right? Because instead of having to do some research to understand how it is that one can manage the immense complexity of life in the world using a very small amount of computation, you say, “Oh, to hell with it, you know. We’ll just use a vast amount of computation,” and instead of being intelligent athlete. Pat Joseph: Yeah. Stuart Russell: And so, you, you could sort of skip over a lot of AI research. Pat Joseph: Yeah. Stuart Russell: And just brute force the hell out of every problem. So, it’s disappointing for me if that happens. But, you know, if we could perhaps use that capability to solve some really, really hard problems that we want solved. Pat Joseph: Yeah. If we get that far. Stuart Russell: If we get that far. And it’s the physicists involved and the mathematicians involved are brilliant and creative and I think they’re going to succeed. Pat Joseph: Okay. Great. Well, I’m getting the signal that we need to wrap up, but I hope you enjoyed this discussion. Thank you for your wonderful questions. We will have a short reception in the atrium. I believe it’s called the atrium afterward, and we’re working on getting another guest here for September. I don’t know who it is yet, but I’m sure they’ll be very smart and very charming and good looking, and… So, thank you. Appreciate it. (audience applauding) Thank you.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.