“The problem of controlling AGI or super intelligence, in my opinion, is like a problem of creating a perpetual safety machine.”
“We can only rule out bugs we find. We cannot rule out bugs and capabilities because we haven’t found them.” (46:37)
“Any work in a safety direction right now seems like a good idea because we are not slowing down. I’m not for a second thinking that my message or anyone else’s will be heard and [we] will be a sane civilization which decides not to kill itself by creating its own replacements.” (1:28:31)
“So again, I’m not inside. From outside, it seems like there is a certain filtering going on and restrictions and criticism and what they can say. And everyone who was working in charge of safety and whose responsibility it was to protect us said, “You know what? I’m going home.” So that’s not encouraging.” (1:32:08)
“It is the most important problem we’ll ever face. It is not like anything we had to deal with before. We never had birth of another intelligence. Like aliens never visited us, as far as I know.” (1:42:57)
LEX. What to you is the probability that super intelligent AI will destroy all human civilization?
ROMAN. What’s the timeframe?
LEX. Let’s say 100 years, in the next hundred years.
ROMAN. So the problem of controlling AGI or super intelligence in my opinion is like a problem of creating a perpetual safety machine. By knowledge with perpetual motion machine, it’s impossible. Yeah, we may succeed and do good job with GPT5, 6, 7, but they just keep improving, learning, eventually self-modifying, interacting with the environment, interacting with malevolent actors. The difference between cybersecurity, narrow AI safety and safety for general AI, for super intelligence, is that we don’t get a second chance. With cybersecurity, somebody hacks your account, what’s the big deal? You get a new password, new credit card, you move on. Here, if we’re talking about existential risks, you only get one chance. So, you are really asking me what are the chances that we’ll create the most complex software ever, on the first try, with zero bugs and it will continue have zero bugs for 100 years or more.
LEX. So there is an incremental improvement of systems leading up to AGI. To you, it doesn’t matter If we can keep those safe, there’s going to be one level of system at which you cannot possibly control it.
ROMAN. I don’t think we so far have made any system safe. At the level of capability they display, they already have made mistakes. We had accidents, they’ve been jailbroken. I don’t think there is a single large language model today, which no one was successful at making do something developers didn’t intend it to do.
LEX. But there’s a difference between getting it to do something unintended, getting it to do something that’s painful, costly, destructive, and something that’s destructive to the level of hurting billions of people or hundreds of millions of people, billions of people or the entirety of human civilization. That’s a big leap.
ROMAN. Exactly, but the systems we have today have capability of causing X amount of damage. So when we fail, that’s all we get. If we develop systems capable of impacting all of humanity, all of universe, the damage is proportionate.
LEX. What to you, are the possible ways that such kind of mass murder of humans can happen?
ROMAN. It’s always a wonderful question. So one of the chapters in my new book is about unpredictability. I argue that we cannot predict what a smarter system will do. So you’re really not asking me how super intelligence will kill everyone. You’re asking me how I would do it. And I think it’s not that interesting. I can tell you about a standard, you know, nanotech, synthetic bioclear, super intelligence will come up with something completely new, completely super. We may not even recognize that as a possible path to achieve that goal.
“the smart thing is not to build something you cannot control, you cannot understand. Build what you can and benefit from it. I’m a big believer in personal self-interest. A lot of guys running those companies are young, rich people. What do they have to gain beyond billions they already have financially, right? It’s not a requirement that they press that button. They can easily wait a long time. They can just choose not to do it. And still have an amazing life. In history, a lot of times if you did something really bad, at least you became part of history books. There is a chance in this case there won’t be any history.” (1:30:37)
“Many domains we see car manufacturing, drug development, the burden of proof is on the manufacturer of product or service to show their product or service is safe. It is not up to the user to prove that there are problems. They have to do appropriate safety studies. We have to get government approval for selling the product. And we are still fully responsible for what happens. We don’t see any of that here. They can deploy whatever they want. And I have to explain how that system is going to kill everyone. I don’t work for that company. You have to explain to me how it’s definitely cannot mess up.” (1:36:41)
ROMAN. The rational decision changes based on your [ASI] position. When you are under the boss, the rational policy may be to be following orders and being honest. When you become a boss, the rational policy may shift. […] It is the most important problem we’ll ever face. It is not like anything we had to deal with before. We never had birth of a nova intelligence. Like aliens never visited us as far as I know.
LEX. Similar type of problem, by the way, if an intelligent alien civilization visited us, that’s a similar kind of situation.
ROMAN. In some ways, if you look at history, anytime a more technologically advanced civilization visited a more primitive one, the results were genocide, every single time.
LEX. What gives you hope about the future?
ROMAN. I could be wrong. I’ve been wrong before. […] It still could be a lot smarter than us. And to dominate long term, you just need some advantage. You have to be the smartest, you don’t have to be a million times smarter. So even five X might be enough.
ROMAN. You may be completely right, but what probability would you assign it? [p(doom)] You may be 10% wrong. But we’re betting all of humanity on this distribution. It seems irrational.
LEX. Yeah, it’s definitely not like one or 0%.
OUTLINE: 0:00 – Introduction 2:20 – Existential risk of AGI 8:32 – Ikigai risk 16:44 – Suffering risk 20:19 – Timeline to AGI 24:51 – AGI turing test 30:14 – Yann LeCun and open source AI 43:06 – AI control 45:33 – Social engineering 48:06 – Fearmongering 57:57 – AI deception 1:04:30 – Verification 1:11:29 – Self-improving AI 1:23:42 – Pausing AI development 1:29:59 – AI Safety 1:39:43 – Current AI 1:45:05 – Simulation 1:52:24 – Aliens 1:53:57 – Human mind 2:00:17 – Neuralink 2:09:23 – Hope for the future 2:13:18 – Meaning of life
LEX. Underpinning a lot of your writing is this sense that we’re screwed, but it just feels like it’s an engineering problem. I don’t understand why we’re screwed it. Time and time again, humanity has gotten itself into trouble and figured out a way to get out of trouble.
ROMAN. We are in a situation where people making more capable systems just need more resources. They don’t need to invent anything in my opinion. Some will disagree, but so far at least I don’t see diminishing returns. If you have 10X compute, you’ll get better performance. The same doesn’t apply to safety. If you give Miri or any other organization 10 times the money, they don’t output 10 times the safety. And the gap became between capabilities and safety becomes bigger and bigger all the time. So it’s hard to be completely optimistic about our results here. I can name 10 excellent breakthrough papers in machine learning. I would struggle to name equally important breakthroughs in safety.
look at governance structures, then you have someone with complete power. They’re extremely dangerous. So the solution we came up with is break it up. You have judicial, legislative, executive, same here, have narrow AI systems, work on important problems, solve immortality. It’s a biological problem we can solve similar to how progress was made with protein folding, using a system which doesn’t also play chess. There is no reason to create super intelligent system to get most of the benefits we want from much safer, narrow systems. – It really is a question to me whether companies are interested in creating anything but narrow AI. I think when term AGI is used by tech companies, they mean narrow AI. They mean narrow AI with amazing capabilities. I do think that there’s a leap between narrow AI with amazing capabilities with superhuman capabilities and the kind of self-motivated agent like AGI system that we’re talking about. I don’t know if it’s obvious to me that a company would want to take the leap to creating an AGI that it would lose control of because then it can’t capture the value from that system.
Pausing AI development
LEX. Is that one possible solution or your proponent of pausing development of AI, whether it’s for six months or completely.
ROMAN. The condition would be not time but capabilities. Pause until you can do X, Y, Z. And if I’m right and you cannot, it’s impossible, then it becomes a permanent ban. But if you are right and it’s possible, so as soon as you have those safety capabilities, go ahead.
LEX. Can you help me understand what is the hopeful path here for you solution wise out of this? It sounds like you’re saying AI systems in the end are unverifiable, unpredictable, as the book says unexplainable, uncontrollable.
ROMAN. That’s the big one.
LEX. Uncontrollable and all the other uns just make it difficult to avoid getting to the uncontrollable I guess. But once it’s uncontrollable then it it goes wild. Surely there’s solutions. Humans are pretty smart. What are possible solutions? Like if you are a dictator of the world, what do we do?
ROMAN. So the smart thing is not to build something you cannot control, you cannot understand. Build what you can and benefit from it. I’m a big believer in personal self-interest. A lot of guys running those companies are young, rich people. What do they have to gain beyond billions they already have financially, right? It’s not a requirement that they press that button. They can easily wait a long time. They can just choose not to do it. And still have a amazing life. In history, a lot of times if you did something really bad, at least you became part of history books, there is a chance in this case there won’t be any history.
LEX. So you’re saying the individuals running these companies should do some soul searching and what? And stop development?
ROMAN. Well either they have to prove that of course it’s possible to indefinitely control godlike super intelligent machines by humans and ideally let us know how or agree that it’s not possible and it’s a very bad idea to do it, including for them personally and their families and friends and capital.
ROMAN. Many domains we see car manufacTuring, drug development, the burden of proof is on the manufacturer of product or service to show their product or service is safe. It is not up to the user to prove that there are problems. They have to do appropriate safety studies. We have to get government approval for selling the product. And we are still fully responsible for what happens. We don’t see any of that here. They can deploy whatever they want. And I have to explain how that system is going to kill everyone. I don’t work for that company. You have to explain to me how it’s definitely cannot mess up.