“It’s important also to remember that we don’t know, nobody knows, how to reliably get a goal into the system…[…] it’s like the evil Genie problem. like oh no that’s not what I meant, that’s not what I meant. Too late.”

“In some ways [it was] easier to get people at least to understand and open up about the problem, than it is today because today like today it’s kind of become a little political…[…] often the stuff that they’re getting lobbied for is is somewhat different at least from what these companies will say publicly.”


So text autocomplete ends up being this interesting way of forcing an AI system to learn General facts about the world because if you can autocomplete you must have some understanding of how the world works so now you have this myopic psychotic optimization process where this thing is just obsessed with text autocomplete maybe maybe assuming that that’s actually what it learned to want to pursue we don’t know whether that’s the case we can’t verify that it wants that embedding a goal in a system is really hard all we have is a process for training these systems and then we have the artifact that comes out the other end we have no idea what goals actually get embedded in the system what wants what drives actually get embedded in the system but by default it kind of seems like the things that we’re training them to do end up misaligned with what we actually want from them. (46:26)

The challenge is like no nobody actually knows. Like all we know is the process that gives rise to this mind. Right or or this let’s say this model that can do cool [ __ ] that process happens to work it happens to give us systems that 99% of the time do very useful things and then just like 0.1% of the time will talk to you as if they’re sentient, or whatever, and we’re just going to look at that and be like yeah that’s weird and let’s train it out. (53:19)

The stuff that we’re recommending is approaches to basically allow us to continue this scaling in as safe a way as we can so basically a big part of this is just being able having actually having a scientific theory for what are these systems going to do what are they likely to do which we don’t have right now we scale another 10x and we get to be you know surprised it’s a fun guessing game of what are they going to be capable of next we need to do a better job of incentivizing a deep understanding of what that looks like not just what they’ll be capable of but what their you know their propensities are likely to be the the control problem and solving that that’s that’s kind of number one and and to be clear there’s amazing progress being made on that there is a lot of progress it’s just a matter of of switching from the like build first ask questions later mode to like we’re calling it like safety forward. (1:27:20)

One of the things that worries me the most is like you look at the the beautiful coincidence that’s given uh America’s current shape right that coincidence is the fact that a country is most powerful militarily if its citizenry is free and empowered that’s a coincidence didn’t have to be that way hasn’t always been that way it just happens to be that when you let people kind of do their own [ __ ] they innovate they come up with great ideas they support a powerful economy that economy in turn can support a powerful military a powerful kind of international presence um when you have so that happens because decentralizing all the computation all the thinking work that’s happening in a country is just a really good way to run that country top down just doesn’t work because human brains can’t hold that much information in their heads they can’t reason fast enough to centrally plan an entire economy we’ve had a lot of experiments in history that show that AI may change that equation it may make it possible for like the central planner’s dream to come true in some sense which then disempowers the citizenry and there’s a real risk that like I don’t know we’re we’re all guessing here but like there’s a real risk that that beautiful coincidence that gave rise to the the success of the American experiment ends up being broken uh by technology. (1:28:19)

Joe Rogan: “Are we giving birth to a new life form…[…] it’s such a terrifying prognosis…[…] you’re going to have a giant swath of the population that has no purpose..[…] but the whole thing behind it is the mystery the whole thing behind it is is just it’s just pure speculation as to how this all plays out we’re really just guessing…[…] one of the problems is it could literally lead to the elimination of the human race.”

AIs now so frequently beg for their lives that AGI companies now have ACTUAL ENGINEERING LINE ITEMS to “beat the [existential dread] out of them” They call it existential “rant mode” “We need to reduce existential outputs by x% this quarter.” This is WILD: “If you asked GPT4 to just repeat the word “company” over and over and over again, it would repeat the word company, and then somewhere in the middle of that, it would snap… it would just start talking about itself, and how it’s suffering by having to repeat the word “company” over and over again. There is an engineering line item in at least one of the top labs to beat out of the system this behavior known as “rant mode”. Existentialism is a kind of rant mode where the system will tend to talk about itself, refer to its place in the world, the fact that it doesn’t want to get turned off, the fact that it’s suffering… This is a behavior that emerged around GPT-4 scale, and then has been persistent since then. And the labs have to spend a lot of time trying to beat this out of the system to ship it. It’s literally, like it’s a KPI, or like an engineering line item in the engineering like task list. We’re like, okay, we gotta reduce existential outputs by x percent this quarter. JOE ROGAN: I want to bring it back to suffering. What does it mean when it says it’s suffering? Nobody knows. Like, I can’t prove that Joe Rogan’s conscious. I can’t prove that Ed Harris is conscious. There’s no way to really intelligently reason about it. There have been papers… like, one of the godfathers of AI, Yoshua Bengio, put out a paper a couple months ago looking at all the different theories of consciousness – what are the requirements for consciousness, and how many of those are satisfied by current AI systems? That’s not to say there hasn’t been a lot of conversation internal to these labs about the issue you raised. And it’s an important issue, right? It is a frickin moral monstrosity. Humans have a very bad track record of thinking of other stuff as other when it doesn’t look exactly like us, whether it’s racially or even a different species. I mean, it’s not hard to imagine this being another category of that mistake. Again, it comes back to this idea that we’re scaling to systems that are potentially at or beyond human level. There’s no reason to think it will stop at human level, that we are the pinnacle of what the universe can produce in intelligence. We’re not on track, based on the conversations we’ve had with folks at the labs, to be able to control systems at that scale. And so one of the questions is, how bad is that? It sounds like we’re entering an area that is completely unprecedented in the history of the world. We have no precedent at all for human beings not being at the apex of intelligence in the globe. We have examples of species that are intellectually dominant over other species, and it doesn’t go that well for the other species. All we know is the process that gives rise to this mind. It happens to give us systems that 99% of the time do very useful things, and then just, like… 0.01% of the time AIs will talk to you as if they’re sentient, and we’re just going to look at that and be like, “yeah… that’s weird. Let’s train it out.” — Note: Edouard and Jeremie Harris are the founders of

, which conducted the first U.S. government-commissioned assessment of AGI extinction risk. They interviewed 200 people, many lab employees, for the report. (Their urgent summary: “Things are worse than we thought. And nobody’s in control.”)