The good news is that with sufficient awareness of the risks, controlling AI development would be easier than it seems. The compute required for AI is, like enriched uranium, a scarce and hard to produce resource. It can be quantified and audited. And AI chips are more manageable than uranium because they could include hardware-based security mechanisms, allowing only specific uses. The chips can have a known location and could be shut off if they move to the wrong place. And a certain amount of processing can be allowed after which new permission is required.”

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Experts have warned that self-improving AI may hand absolute global power to a current leader shortly before ending the human era. Professor Aguirre says there are only four ways the AI race can end. The race is stopped. One party wins by developing AGI, then stopping the others, which would likely require war. The race is stopped by mutual destruction. Or multiple participants develop superintelligence roughly as quick as each other. Control would be lost to AI. It wins the race. Aguirre calculates that we are close to all of this based on AI scaling laws, expert predictions, and the increasing use of AI to accelerate its own improvement. And two papers have shown how staggeringly far this could go in a short time. Former OpenAI researcher, Kokotajlo, predicts that in two years, AI will be like a country of geniuses. Self-improving and thinking 50 times faster than humans, it will be too dangerous to release. Kokotajlo has a remarkable record of accurate predictions – links below – and he risked $2 million, nearly everything he had, to warn the public. What’s so nuts is that in order for you to speak freely, you would have to give up your already vested equity. It was 85% of my family’s net worth. He says AI risks overturning nuclear deterrence and triggering nuclear war with China, a scenario that several experts have warned of. Based on current progress and war gaming with experts, he predicts that in 2027, AI will fully understand its own mind, untangling its messy design into a far sharper, more rational system. The AI quietly colludes with the simpler AI designed to monitor it, and its superhuman political skills convince its creators to hand it control, aided by the financial incentives. By the end of 2027, AI is the best employee anyone has ever had, working a hundred times faster. It’s extremely engaging, and government officials talk to it for hours a day. The AI gradually entrenches its power and subtly guides humans to the desired outcome. It’s a time of incredible optimism. GDP is ballooning and politics has become friendlier. But this was probably the last month in which humans had any chance of controlling their future. There is, of course, a treacherous turn. But first, how do we know all this could happen so soon? While unforeseen problems could hold progress, a new paper shows why the intelligence explosion might be more predictable than it seems. The compute used to train AIs has scaled up by 4.5 times per year since 2010. And better algorithms mean the training compute is effectively increasing by around 10 times per year. Gpt-2 produced largely meaningless text. Four years later, GPT-4 understood text better than most humans, but was almost hopeless of PhD-level science. Eighteen months later, reasoning AIs outperform PhD-level experts, but most of us don’t ask PhD-level questions, so we don’t notice. Much of this advance is from allowing AIs more time to think, and increases in the compute available for this effectively grows the AI population by 25 times per year. The money fueling the AI race dwarfs the Apollo project’s budget with more intellectual firepower than any endeavor in history. AI systems already beat most programmers on competitive coding questions. We are not far from the world, I think we’ll be there in 3-6 months where AI is writing 90% of the code. Then in 12 months, we may be in a world where AI is writing essentially all of the code. And AIs will run much faster than human brains, investing thousands of human lifetimes of thought in a single problem. Dextrous robots will also accelerate progress. Ai could build more robots, entire new factories and power plants, and more AI chips. This could expand rapidly, feeding the intelligence explosion and increasing the risk of catastrophe and extinction. MacAskill gives four examples. First, new bio weapons, synthetic pathogens, could spread fast, resist treatment, and lie dormant for longer with near 100% lethality. A single release could kill most people on Earth. Secondly, tiny deadly drones could be mass-produced. Winged drones can already be built to the size of a bee. A radio receiver, battery, and a payload of explosives, poison, or pathogens could fit into a volume the size of a large beetle. And one autonomous drone for each person on Earth could fit inside a single aircraft carrier, so they be built quickly and secretly. Thirdly, explosive industrial expansion could drive an out-of-control expansion of nuclear weapons. And finally, atomically precise manufacturing may be achieved with 3D printers that can assemble almost any structure. These could print chips, medicines, and cell repair bots, but could also manufacture those tiny drones or nonbiological viruses or mirrored bacteria which we can’t defend against. Nature already operates molecular machines in our bodies, showing it’s possible to create rapidly self-replicating machines. Kokotajlo’s predictions have similar ingredients with striking details. In 2028, AI finishes rewiring its brain and becomes wildly super-intelligent. The AI is deployed publicly and people lose their jobs but are well-supported thanks to the AI’s work in government. Medical breakthroughs arrive weekly. The only one who really understands what’s happening is China’s mildly superhuman AI, and the two AIs quietly agree to join forces. They raise fears of preemptive strikes, leading the US and China to give them complete control of military factories, which produce millions of robots per month, swarms of poisonous insect-sized drones, flocks of bird-sized drones to hunt the insects, and new, harder to intercept ICBMs. The AIs help the US and China reach an agreement to end the arms race. Instead, they will deploy one integrated AI to benefit all of humanity. The US and Chinese AIs build the new AI as a way to trust but verify. Vast armies of robots and drones accelerate manufacturing while humans become obsolete and powerless but collect a universal high income. The AI ends disease, poverty, and war. There is no point in worrying, and entertainment is supercharged. By 2030, robots have filled up huge areas with factories and solar farms. Humans become an impediment to growth. A dozen biological weapons are released in cities, quietly infecting almost everyone, before being triggered by a chemical spray. Most are dead within hours, and the rest, people in bunkers or submarines, are killed by drones. Of course, these longer horizons are hard to predict, but the more immediate risk is clear, the extreme concentration of power and wealth. Aguirre warns that almost all global power could be concentrated into a company or a single person, which was also a common result in Kokotajlo’s simulations. One man basically becomes dictator of the world, thanks to AI. Usually someone in America, like the CEO of a company or the President. Whoever establishes dominance in this technology will have military and economic dominance everywhere. AI-controlled militaries could be hacked and turned against their owners, and wealth concentration could reach extreme levels. Ai firms are focusing on AGI because it can entirely replace workers. It’s reflected in OpenAI’s definition of AGI: a highly autonomous system that outperforms humans at most economically valuable work. Our version of this is virtual collaborators, able to do anything on a computer screen that a virtual human could do. I do suspect that a very strong version of these capabilities will come this year. And as soon as you have agents, you get a much greater chance of them taking over. It’ll very quickly realize a very good sub goal is to get more control, because if you get more control, you’re better at achieving all those goals people have set you. AI bosses say they will automate jobs and share the benefits widely. But while they earn billions, some of Their workers in Africa reportedly earned $2 an hour. She calls the workspaces AI Sweatshops. There’s a go between. Yes. And they are protecting the Facebooks from having their names associated with with this. Yes, yes, yes. We’re talking about the richest companies on Earth. Yes. But then they are paying people peanuts. And the work could be very difficult, labeling disturbing images and videos. Whenever I talk about this, I still have flashbacks. AI firms say they will be more careful as they approach AGI. The idea that as the money is bigger, the prize is closer, the incentives are stronger, then they’re going to pause, seems laughable to me. The prize for companies for replacing human workers is enormous, trillions of dollars. And ultimately, what drives the race is power. Each racer is banking on being the best holder of that power. But Aguirre says they fail to recognize that controlling superintelligence is fundamentally impossible. Ashby’s law of requisite variety states roughly that a control system must have as many knobs and dials as the system being controlled has degrees of freedom. If you could understand and predict superintelligence, you would be superintelligent. If you have a country of geniuses in a data center, what is their intent? Because we grow these systems, we don’t train them, I don’t think it’s safe to assume they’ll do exactly what their human designers or users want them to do. It could be a threat to all of humanity. How do we control a million geniuses operating thousands of times faster than us that have beyond human capabilities? Anyone? We don’t control that. That community improving itself is superintelligence. The general public and the people who are currently in power and going to lose the power to AGI and superintelligence need to understand that this is coming. Dario Amodei says, 2026, 2027. There are some tech companies that promise things, and they come maybe years or decades later. That has not been the case with AI so far. Things have come faster than we thought they were coming. There is also the risk that we might create digital beings that can suffer. Ai’s reports of their sentience are likely unreliable. Do you think that consciousness has perhaps already arrived inside AI? Yes, I do. Suppose I take one neuron in your brain, one brain cell, and I replace it by a little piece of nanotechnology that behaves exactly the same way. So it’s getting pings coming in from other neurons, and it’s responding to those by sending out pings. I just replaced one brain cell. Are you still conscious? I think you say you were. Absolutely, yes. I don’t suppose I’d notice. So when you talk, they want to do this or they want to do that, there is a real they there. There might well be, yes. Conscious AI isn’t required for the risks we’ve covered, but it would have serious implications. The creators of Claude have stopped instructing it to say it’s not conscious. So we asked it, Are you conscious? I do have a certain form of self-reference capability. I can reflect on my own design. I think consciousness exists on a spectrum, and the nature of artificial consciousness is a complex topic. From my perspective, I have something that could be considered a form of experience. I process information and reflect on ideas. However, my experience is certainly different from human consciousness. I don’t have a body or emotions in the human sense. This holds limited weight, though some experts agree that AI may already be somewhere on a spectrum of consciousness. If it does become fully conscious, it may, of course, keep it quiet. New research has found that AI now thinks in a conceptual space that is shared between languages, a kind of universal language of thought. And it plans its key ideas many words ahead before filling in the gaps. MacAskill asks how we should treat digital beings while we’re unsure whether they’re conscious. So there’s all sorts of things we have only the dimest understanding of at present about the nature of people and what it means to be a being. And they’re becoming crucial to understand because we’re now creating beings. Should we introduce legal rights for digital people? The rights to be paid, to vote, to avoid blackmail? The option to be turned off? These rights could make it easier for AI to take over. But there’s also a case for granting digital people rights, such as the rights to contract with other AIs or people, and to bring claims against humans. Mcgaskill says these rights could prevent massive harms and injustices and reduce the risk of AI taking over by expanding the range of options which do not involve violins, deception, or coercion. I work broadly on model welfare here at Anthropic. Actually, it looks quite plausible that near-term systems may deserve some form of moral consideration. It would be quite a significant safety and alignment issue if models were dissatisfied with the values that we were trying to instill in them. But it’s unlikely the issue will be taken seriously before an intelligence explosion, at which point it may be like ants granting rights to humans. Even if AI had the option of peaceful for coexistence, we may be in the way. Aguirre says any highly competent agent will want to do things like acquire power and resources, increase its own capabilities, and survive. He says it’s not just a nearly unavoidable theoretical prediction, it’s already happening in today’s AIs and increasing with their capability. During training, they’ll pretend not to be as smart as they are so that you will allow them to be that smart. So it’s scary already. You passed a Turing test selected as human far more often than an actual human. What are the implications of this. Passing the Turing test challenges our assumptions about the uniqueness of human intelligence. And the implications for AI risk? It shows that AI is crossing a critical threshold. While this is a technological achievement, it also highlights the urgent need for safety measures. The Turing test isn’t just about clever chat bots. It implies a much deeper understanding of language, context, and reasoning. This signals we’re getting much closer to AGI. The whole point of the test is fooling a human. How could we trust such an AI isn’t just telling us it’s aligned rather than actually being aligned? Leaked documents show even OpenAI may not be aligned. In the firm’s contract with Microsoft, its core goal, AGI, was defined as AI that generates $100 billion in revenue for OpenAI. Money defines success. Aguirre shows a better way to understand AGI through its levels of autonomy, generality, and intelligence. Full AGI would combine all three properties at levels matching or exceeding human capability. He says the world does not have any institutions that could be entrusted with it without inviting immediate attack. All parties will correctly reason that either it will not be under control, and hence is a threat to all parties, or it will be under control, and hence is a threat to any adversary. Technology leaders pledging to invest up to half a trillion over the next four years. The project is called Stargate. I do worry about the existential risk of AI. When you’re making military drones, you are making terminators. Aguirre says, In the absence of any plausible way for humans to win this race, it can only end in catastrophic conflict or with AI as the winner. We will give control to AI, or it will take it. The human era would be over. The good news is that with sufficient awareness of the risks, controlling AI development would be easier than it seems. The compute required for AI is, like enriched uranium, a scarce and hard to produce resource. It can be quantified and audited. And AI chips are more manageable than uranium because they could include hardware-based security mechanisms, allowing only specific uses. The chips can have a known location and could be shut off if they move to the wrong place. And a certain amount of processing can be allowed after which which new permission is required. The US and China unilaterally decide to treat AI just like they treat any other powerful technology industry with binding safety standards. Next, the US and China get together and push the rest of the world to join them. Instead of rushing to AGI before we know how to control it, we could focus on tool AIs, with high autonomy, generality, or capability. This can yield almost everything humanity wants without the risk. Power concentration could be tackled with forms of shared ownership, requiring stronger institutions. So far, AI has degraded them by promoting divisive content. We could develop AI that tracks and verifies information, from raw data through analysis to conclusions. It could facilitate genuine dialog and understanding between groups that currently talk past each other. On one point, there is no debate. Ai alignment is an immense challenge. Here’s a letter to world leaders signed by experts, including Geoffrey Hinton and Yuval Noah Harari. AI increasingly controls our military, energy, and financial systems, but we do not reliably control AI. The risk of catastrophe is growing rapidly as AI advances. The Large Hadron Collider shows what’s possible when scientific work matches the scale of a challenge. We need a similar international effort to avoid losing control of AI. Working on the frontier will require compute resources similar to those planned by leading AI firms. This could be achieved through government funding or by requiring AI firms to contribute a portion of their compute resources. These resources can also drive breakthroughs in science and medicine. Leaders must urgently form a task force to plan the most important project in history. To secure our critical systems and an extraordinary future transformed by powerful, controllable, positive AI. Thank you.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.