“So I say when you have a race to the bottom, it doesn’t matter who wins, everyone loses, right? Because you make the unsafe system that you know helps your adversary or causes economic problems or uh you know is is unsafe from an alignment perspective. The way I think about the race to the top is that um it doesn’t matter it doesn’t matter who it doesn’t matter who wins, everyone wins, right? So the way the race to the top works is you set an example for how the field works. to say um uh uh you know we’re going to engage in this practice. So a key example of this is responsible scaling policies. We were the first to put out a responsible scaling policy and you know we didn’t say everyone else should do this or you’re bad guys. We didn’t you know we didn’t you know kind of try to use it at his advantage. We put it out and then we encouraged everyone else to do it.” — Dario Amodai

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

54:15 Governance & safety: “race to the top” and the control debate

So, and I’m I’m sure you’ve heard the criticism from people like Jensen who say, “Well, Daario thinks he’s the only one who can build this safely and therefore, speaking of that word control, wants to control the entire industry.” I’ve never said I’ve never said anything like that. That’s an outrageous lie. That’s the most outrageous lie I’ve ever heard. Um, by the way, I’m sorry if I got Jensen’s words wrong, but No, no, no. The words were correct. Okay. the but but but but the words are the words are outrageous. In fact, I’ve said multiple times and I think Anthropic’s actions have shown it that um you know, we’re aiming for something we call a race to the top. Um you know, I’ve said this on podcasts over the years and I think anthropics actions have shown it where you know with a race to the bottom, right, everyone is competing to like, you know, get things out as fast as possible. And so I say when you have a race to the bottom, it doesn’t matter who wins, everyone loses, right? Because you make the unsafe system that you know helps your adversary or causes economic problems or uh you know is is unsafe from an alignment perspective. The way I think about the race to the top is that um it doesn’t matter it doesn’t matter who it doesn’t matter who wins, everyone wins, right? So the way the race to the top works is you set an example for how the field works. to say um uh uh you know we’re going to engage in this practice. So a key example of this is responsible scaling policies. We were the first to put out a responsible scaling policy and you know we didn’t say everyone else should do this or you’re bad guys. We didn’t you know we didn’t you know kind of try to use it at his advantage. We put it out and then we encouraged everyone else to do it. Um and many and and then we discovered in the months after that that you know there were people within the other companies who were trying to put out responsible scaling policies but the fact that we had done it allowed you know gave those people permission right kind of kind of enabled those people to um you know to make the argument to leadership hey anthropic is doing this so we should do it as well. The same has been true of investing in interpretability. we release our interpretability research to everyone um uh and allow other companies to copy it even though we’ve seen that it sometimes has commercial advantages same with things like constitutional AI same with the the the measure the the the you know the measurement of the dangers of our system dangerous capabilities evals so we’re trying to set an example for the field but there’s an interplay where it helps to be a powerful commercial competitor I’ve said nothing that that any that anywhere ware near resembles the idea that this company should be the only one to build the technology. I don’t know how anyone could ever derive that from anything that I’ve said. Um it’s it’s it’s it’s just Yeah. Yeah. It’s just it’s just it’s just a it’s just an incredible and bad faith distortion. All right, let’s see if we can lightening around like one or two before I ask you the last one, which we’ll have five minutes for. Um what happened with SPF? Like what happened with SPF? I mean, he was one of Go ahead. I couldn’t tell you. I couldn’t was what was the what didn’t you answer? I I probably met the guy four or five times. Um uh so I have no great insight um into the you know what in you know into the psychology of SPF or or you know why why he did things as stupid or immoral as as as as as he did. I think the only uh the only you know the the only thing I had ever seen ahead of time with uh SPF was uh you know a couple people mentioned to me that he was like hard to work with that you know he was he was like a bit of a move fast and break things guy. Um and I was like okay you know there’s like plenty of people Welcome to Silicon Valley. Yeah. Like welcome welcome to Silicon Valley. Um and so I remember saying okay I’m going to give this guy non- voting chairs. I’m not going to put him on the board. He sounds like a bit, you know, he sounds like a bad person to deal with every day. Um, but, you know, he’s excited about AI. He’s excited about AI safety. He’s, he’s, you know, he’s a bull on on on AI and he’s interested in AI safety. So, you know, seems like a seems like a sensible seems like a sensible thing to do. you know in in uh in in you know in like in in in in retrospect um you know that that you know move fast and and and break things you know was turned out to be much much much more extreme and bad than than you know than than than I ever imagined. Okay. So let’s end here. So you found your impact I mean you’re you’re working the dream pretty much right now. I mean think about all the ways that uh AI can be used for biology uh just a start. You also say that this is a dangerous technology and I’m curious if your desire for impact um could be pushing you to accelerate this technology um while you know potentially devaluing the possibility that it could that controlling it might not be feasible. So you know I think I have more than anyone else in the industry warned about the dangers of the technology. Right? We just spent 10 20 minutes talking about, you know, the the frightening the, you know, the large array of, you know, people who run, you know, trillion dollar companies criticizing me for, you know, for talking about the the the dangers of these technologies, right? You know, I have US government officials. I have people who run $4 trillion companies criticizing me for talking about the dangers of the technology, right? imputing all these bizarre motives that bear no relationship to, you know, to anything I’ve ever said, not supported in anything I’ve ever done. And yet, I’m going to continue to do it. Um, I actually think that, you know, as the revenues, as the economic business of AI ramps up, and it’s ramping up exponentially, you know, if if I’m right, in a couple years, it’ll be the biggest source of revenue in the world, right? It’ll be the biggest industry in the world. And people who run companies already think it. So we actually have this terrifying situation where uh you know hundreds of billions to trillions to I would say maybe 20 trillion of capitals on the side of accelerate AI as fast as possible. we have this, you know, company that’s very valuable in absolute terms, but you know, looks very small compared to that, right? 60 60 $60 60 billion. And I keep speaking up even if, you know, it makes folks in, you know, there have been these articles f you know, some folks in the US government are upset at us, for example, for opposing the moratorium on AI regulation, for being in favor of export controls for chips on China, for talking about the economic impacts of AI. Every time I do that, I get attacked by many of my peers. Right. But you’re still assuming that we can control it. That’s what I’m pointing out. But I’m I’m just I’m just telling you how much how much effort, how much persistence, how much despite everything that stacked up, despite all the dangers, despite the risk that it has to the company of being willing to speak up, I’m willing to do it. And and and that’s that’s what that’s that’s why I’m saying that look if if I thought that there was no way to control the technology, right? If I thought even even if I thought this is just a gamble, right? Some people are like, “Oh, you think there’s a five or 10% chance that AI could go wrong, you’re just rolling the dice.” That’s not the way I think about it. This is a multi-step game, right? You take one step, you build the next step of most powerful models, you have a more intensive testing regime. As we get closer and closer to the more powerful models, I’m speaking up more and more and I’m taking more and more drastic actions because I’m concerned that the risks of AI are getting closer and closer. We’re working to address them. we’ve made a certain amount of progress, but when I worry that the progress that we’ve made on the risks does not you know is not fully aligned with the um uh you know is not going as fast as we need to go for the speed of the technology then I speed up then I then I speak up louder. Um, and so you know, you’re asking why am I why am I talk, you know, what, you know, you started this interview by saying what’s gotten into you? Why are you talking about this? It’s because the exponential is getting to the point that that I worry that we may have a situation that our ability to handle the risk is not keeping up with the speed of the technology. And that’s how I’m responding to it. If I believe that there was no way to control a technology, which I I I I I see absolutely no evidence for that proposition, we’ve gotten better at controlling models with every model with every model that we release, right? All these things go wrong, but like you really you really have to stress test the models pretty hard. That doesn’t mean you can’t have emergent bad behavior. And I think, you know, if we got to much more powerful models with only the alignment techniques we have now, then I’d be very concerned. then I’d be out there saying everyone should stop building these things. Even China should stop building these. I don’t think they’d listen to me, which is one reason I think export controls is a better is is is is a better measure. But if if we got a few years ahead in models and had only the alignment and steering techniques we had today, then you know, I would definitely be advocating for us to to, you know, to to to to slow down a lot. The reason I’m warning about the risk is so that we don’t have to slow down. So that we can invest in safety techniques and can continue the progress continue the progress of the field. It would be a huge economic effort even if one company was willing to slow down the technology. You know that doesn’t stop all the other companies that doesn’t stop our geopolitical adversaries to whom this is a existential fight fight for survival. So, you know, there there’s there’s there’s very little, you know, there’s very little latitude here, right? We’re stuck between all the benefits of the technology, the race to the race to accelerate it and the fact that that is a multi-party race. And so, I am doing the best thing I can do, which is to invest in safety technology to speed to to speed up the progress of safety. I’ve written essays on the importance of interpretability on how important various directions in uh in in in safety are. We release all of our safety work openly because we think that’s the thing that’s a public good. That’s the thing that everyone that that that that that everyone needs to share. So if you have if you have a better strategy for balancing the benefits, the inevitability of the technology and the risks that it face, I am very open to hear it because I go to sleep every night thinking about it because I have such an incredible understanding of the stakes in terms of in terms of the benefits in terms of you know what it can do, the lives that it can save. I’ve seen that personally. I also have seen the risks personally. We’ve already seen things go wrong with the models. You know, we have an example of that with Grock. And you know, people dismiss this, but they’re not going to laugh anymore when the models are taking actions, when they’re manufacturing, and when they’re in charge of, you know, medical medical interventions, right? People can laugh at the at the risks when the models are just talking. But I think it’s very serious. And so I think what this situation demands is a very serious understanding of both the risks and the benefits. These are highstakes decisions. They need to be made with they they they they need to be made with a seriousness. And and I think something that makes me very concerned is that on one hand we have a cadre of people who are just doomers. People call me a doomer. I’m not. But there are doomers out there. People who say they know there’s no way to build this safely. You know, I I’ve I’ve looked at their arguments. They’re a bunch of gobbledegook. The idea that these models have dangers associated with them, including dangers to humanity as a whole, that makes sense to me. The idea that we can kind of logically prove that there’s no way to make them safe, that seems like nonsense to me. So, so I think that is an intellectually and morally unserious way to respond to the situation we’re in. I also think it is intellectually and morally unserious for people who are sitting on 20 trillion dollars of capital who all work together because their incentives are all in the same way. There are dollar signs in all of their eyes um to sit there and say we shouldn’t regulate this technology for 10 years. Anyone who says that we should worry about the safety of these models is someone who just wants to control the technology themselves. That’s an outrageous claim and it’s a morally unserious claim. We’ve sat here and we’ve done every possible piece of research. We speak up when we believe it’s appropriate to do so. We’ve tried to back up, you know, when we make claims about the economic impact of AI. We have an economic research council. We have an we have a we have a economic index that we use to track the model in real time. And we’re giving grants for people to understand the economic the economic impact the the economic impact of the technology. I think for people who are far more financially invested in the success of the technology than than than than I am to just you know breezily lob add ad homonym attacks you know I think that is just as intellectually and morally unserious as the doomer’s position um I think what we need here is we need more thoughtfulness we need more honesty we need more people willing to willing to go against their interest willing to not have you know breezy Twitter fights, uh, hot takes. We need people to actually invest in understanding the situation, actually do the work, actually put out the research, and and actually add some light and some insight to to the situation that we’re in. I am trying to do that. I don’t think I’m doing that perfectly as no human can. I’m trying to do it as well as I can. It would be very helpful if there were others who would try to do the same thing. Well, Dario, I said this off camera, but I want to make sure to say it on as we’re wrapping up. Um, I appreciate how much Anthropic publishes. We have learned a ton from the experiments, everything from red teaming the models to vending machine Claude, which we didn’t have a chance to speak about today. Um, but I think the world is better off just to hear everything going on here. And, and to that note, thank you for sitting down with me and spending so much time together. Thanks for having me. Thanks everybody for listening and watching. and we’ll see you next time on Big Technology Podcast.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.