LEX. one of the topics I care a lot about, artificial intelligence, you’ve had great public and private conversations about this topic and-

SAM. Yeah, and Elon was very formative in my taking that issue seriously. I mean, he and I went to that initial conference in Puerto Rico together and it was only because he was going and I found out about it through him, and I just rode his coattails to it, you know, that I got dropped in that side of the pool to hear about these concerns at that point.

LEX. It would be interesting to hear how’s your concern evolved with the coming out of ChatGPT and these new large language models that are fine-tuned with reinforcement learning and seemingly to be able to do some incredible human-like things. There’s two questions. One, how has your concern in terms of AGI and super intelligence evolved and how impressed are you with ChatGPT as a student of the human mind and mind in general?

SAM. Well, my concern about AGI is unchanged. And so I did a, I’ve spoken about it a bunch on my podcast, but, you know, I did a TED Talk in 2016, which was the kinda summary of what that conference and, you know, various conversations I had after that did to my brain on this topic.

LEX. Basically that once super intelligence is achieved, there’s a takeoff, it becomes exponentially smarter and in a matter of time, they’re just, we’re ants and they’re gods.

SAM. Well, yeah, and unless we find some way of permanently tethering a super intelligent self-improving AI to our value system. And, you know, I don’t believe anyone has figured out how to do that or whether that’s even possible in principle. I mean, I know people like Stuart Russell who I just had on my podcast, are-

LEX. Oh, really? Have you released it yet?

SAM. I Haven’t released it yet.

LEX. Oh, great.

SAM. He’s been on a previous podcast, but we just recorded this week. –

LEX. ‘Cause you haven’t done an AI podcast in a while, so it’s great.

SAM. Yeah. Yeah.

LEX. It’s great. He’s a good person to talk about alignment with.

SAM. Yeah, so Stuart I mean, Stuart has been, you know, probably more than anyone, my guru on this topic. I mean, like just reading his book and doing, I think I’ve done two podcasts with him at this point.

LEX. I think It’s called “The Control Problem” or something like that?

SAM. His book is “Human Compatible.”

LEX. Human Compatible.

SAM. Yeah, he talks about the control problem. And yeah, so I just think the idea that we can define a value function in advance that permanently tethers a self-improving, super-intelligent AI to our values as we continue to discover them, refine them, extrapolate them in an open-ended way, I think that’s a tall order. And I think there are many more ways, there must be many more ways of designing super intelligence that is not aligned in that way and is not ever approximating our values in that way. So, I mean, Stuart’s idea, to put it in a very simple way, is that he thinks you don’t wanna specify the value function upfront. You don’t wanna imagine you could ever write the code in such a way as to admit of no loophole. You want to make the AI uncertain as to what human values are and perpetually uncertain and always trying to ameliorate that uncertainty by hewing more and more closely to what our professed values are. So like, just it’s always interested in us saying, oh, no, no, that’s not what we want. That’s not what we intend. Stop doing that, right? Like no matter how smart it gets, all it wants to do is more perfectly approximate human values. I think there are a lot of problems with that, you know, at a high level, I’m not a computer scientist, so I’m sure there are many problems at a low level that I don’t understand or.

LEX. Like how to force a human into the loop always no matter what.

SAM. There’s that and like what humans get a vote and just what is, you know, what do humans value and what is the difference between what we say we value and our revealed preferences, which, I mean, if you just, if you were a super intelligent AI that could look at humanity now, I think you could be forgiven for concluding that what we value is driving ourselves crazy with Twitter and living perpetually on the brink of nuclear war and, you know, just watching, you know, hot girls in yoga pants on TikTok again and again and again, it’s like, what-

LEX. And you’re saying that is not what we-

SAM. This is all revealed preference and it’s what is an AI to make of that, right? And what should it optimize? Like, so part of, this is also Stuart’s observation that one of the insidious things about like the YouTube algorithm is that it’s not that it just caters to our preferences, it actually begins to change us in ways so as to make us more predictable. Like it finds ways to make us a better reporter of our preferences and to trim our preferences down so that it can further train to that signal. So the main concern is that most of the people in the field seem not to be taking intelligence seriously, like.

LES. Still?

SAM. Yeah, as they design more and more intelligent machines, and as they profess to want to design true AGI, they’re not, again, they’re not spending the time that Stuart is spending trying to figure out how to do this safely, above all. They’re just assuming that these problems are gonna solve themselves as we make that final stride into the end zone. Or they’re saying very, you know, pollyannish things like, you know, an AI would never form a motive to harm human, like why would it ever form a motive to be malicious toward humanity, right? Unless we put that motive in there, right? And that’s not the concern. The concern is that in the presence of, of vast disparities in competence and in, certainly in a condition where the machines are improving themselves, they’re improving their own code, they could be developing goal instrumental goals that are antithetical to our wellbeing without any intent to harm us, right? It’s analogous to what we do to every other species on Earth. I mean you and I don’t consciously form the intention to harm insects on a daily basis, but there are many things we could intend to do that would in fact, harm insects, because, you know, you decide to repave your driveway or whatever you’re doing, like you’re just not taking the interest of insects into account, because they’re so far beneath you in terms of your cognitive horizons. And so the real challenge here is that if you believe that intelligence, you know, scales up on a continuum toward heights that we can only dimly imagine, and I think there’s every reason to believe that, there’s just no reason to believe that we’re near the summit of intelligence. And you can, you know, define, maybe there’s some forms of intelligence for which this is not true, but for many relevant forms, you know, like the top 100 things we care about cognitively, I think there’s every reason to believe that many of those things, most of those things are a lot like chess or Go where once the machines get better than we are, they’re gonna stay better than we are. Although they’re, I dunno if you caught the recent thing with Go where, ‘ this actually came outta Stuart’s lab.

LEX/SAM. One. – Yeah, yeah, – Yeah, one time a human beat a machine.

SAM. Yeah, they found a hack for that. But anyway, ultimately, there’s gonna be no looking back and then the question is what do we do in relationship to these systems that are more competent than we are in every relevant respect? Because it will be a relationship. It’s not like, the people who think we’re just gonna figure this all out, you know, without thinking about it in advance, the solutions are just gonna find themselves, seem not to be taking the prospect of really creating autonomous super intelligence seriously. Like what does that mean? It’s every bit as independent and ungovernable ultimately as us having created, I mean, just imagine if we created a race of people that were 10 times smarter than all of us. Like how would we live with those people? They’re 10 times smarter than us, right? Like they begin to talk about things we don’t understand. They begin to want things we don’t understand. They begin to view us as obstacles to them that, so they’re solving those problems or gratifying those desires. We become the chickens or the monkeys in their presence. And I think that it’s, but for some amazing solution of the sort that Stuart is imagining that we could somehow anchor their reward function permanently, no matter how intelligent scales. I think it’s really worth worrying about this. I do buy the, you know, the sci-fi notion that this is an existential risk if we don’t do it well.

LEX. I worry that we don’t notice it. I’m deeply impressed with ChatGPT and I’m worried that it will become super intelligent. These language models would become super intelligent, because they’re basically trained in the collective intelligence of the human species. And then it’ll start controlling our behavior if they’re integrated into our algorithms, the recommender systems, and then we just won’t notice that there’s a super intelligent system that’s controlling our behavior.

SAM. Well, I think that’s true even before, far before super-intelligence, even before general intelligence. I mean I think just the narrow intelligence of these algorithms and of what something like, you know, Chat GPT can do, I mean, it’s just far short of it developing its own goals and that they are at cross purposes with ours. Just just the unintended consequences of using it in the ways we are going to be incentivized to use it and, you know, the money to be made from scaling this thing and what it does to our information space and our sense of just being able to get the ground truth on any facts, it’s, yeah, it’s super scary. And it was, it’s…

LEX. Do you think it’s a giant leap in terms of the development towards AGI, ChatGPT, or we still, is this just an impressive little toolbox? So like when do you think the singularity’s coming or is it to you, doesn’t matter if eventually-

SAM. Yeah, I have no intuitions on that front apart from the fact that if we continue to make progress, it will come, right? So you just have to assume we continue to make progress. There’s only two assumptions. You have to assume substrate independence. So there’s no reason why this can’t be done in silico. It’s just we can build arbitrarily intelligent machines. There’s nothing magical about having this done in the wetware of our own brains. I think that is true, and I think that’s, you know, scientifically parsimonious to think that that’s true. And then you just have to assume we’re gonna keep making progress. It doesn’t have to be any special rate of progress. It doesn’t have to be Moore’s Law. It can just be, we just keep going. At a certain point we’re gonna be in relationship to minds, leaving consciousness aside, I don’t have any reason to believe that they’ll necessarily be conscious by virtue of being super intelligent. And that’s its own interesting ethical question. But leaving consciousness aside, they’re gonna be more competent than we are. And then that’s like, you know, the aliens have landed, you know? That’s literally, that’s an encounter with, again, leaving aside the possibility that something like Stuart’s path is actually available to us, but it is hard to picture, if what we mean by intelligence, all things considered, and it’s truly general, if that scales and, you know, begins to build upon itself, how you maintain that perfect slavish devotion until the end of time in those systems.

LEX. The tether to humans – Yeah. – I think my gut says that that tether is not, there’s a lot of ways to do it. So it’s not this increasingly impossible problem.

SAM. Right, so I have no, you know, as you know, I’m not a computer scientist, so I have no intuitions about, just algorithmically, how you would approach that and what’s somewhat possible.

LEX. My main intuition is maybe deeply flawed, but the main intuition is based on the fact that most of the learning is currently happening on human knowledge. So even ChatGPT is just trained on human data. – [Sam] Right. – I don’t see where the takeoff happens where you completely go above human wisdom. The current impressive aspect of ChatGPT is that it’s using collective intelligence of all of us.

SAM. Well, from what I gleaned, again, from people who know much more about this than I do, I think we have reason to be skeptical that these techniques of, you know, deep learning are actually going to be sufficient to push us into AGI, right? So it’s just, they’re not generalizing in the way they need to, they’re certainly not learning like human children. And so there’s brittle and strange ways. It’s not to say that the human path is the only path, you know, and maybe there’s, we might learn better lessons by ignoring the way brains work, but we know that they don’t generalize and use abstraction the way we do and so.

LEX. Although the interesting-

SAM. The strange holes in their competence.

LEX. But the size of the holes is shrinking every time and that’s, so the intuition starts to slowly fall apart. You know, the intuition is like, surely can’t be this simple to achieve super intelligence.  Yeah, yeah. – (chuckle) But it’s becoming simpler and simpler, so I don’t know. The progress is quite incredible. I’ve been extremely impressed with ChatGPT and the new models, and there’s a lot of financial incentive to make progress in this regard so we’re going to be living through some very interesting times.