“If the US fights China in an AGI race, the only winners will be machines.” — Max Tegmark

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Many AI pundits have US-China AI collaboration totally backwards. The point is *not* to convince the other to shun uncontrollable AGI, but to jointly ensure that the rest of the world follows suit, including weak or corrupt governments. • It’s not in the US self-interest to disempower itself and all its current power centers by allowing a US company to build uncontrollable AGI. • It’s not in the interest of the Chinese Communist Party to disempower itself by allowing a Chinese company to build uncontrollable AGI. Once the US and Chinese leadership serves their self-interest by preventing uncontrollable AGI at home, they have a shared incentive to coordinate to do the same globally. The reason that this self-interest hasn’t yet played out is that US and Chinese leaders still haven’t fully understood the game theory payout matrix: the well-funded and hopium-fueled disinformation campaign claiming that Turing, @geoffreyhinton, @Yoshua_Bengio, Russell, @ESYudkowsky et al are wrong (that we’re likely to figure out to align/control AGI in time if we “scale quickly”) is massively successful. That success is unsurprising, given how successful similar disinformation campaigns were for, e.g., tobacco, asbestos and leaded gasoline – the only difference is that the stakes are much higher now. I wrote an essay elaborating on this suicide race dynamic: lesswrong.com/posts/oJQnRDbg

As humanity gets closer to Artificial General Intelligence (AGI), a new geopolitical strategy is gaining traction in US and allied circles, in the NatSec, AI safety and tech communities. Anthropic CEO Dario Amodei and RAND Corporation call it the “entente”, while others privately refer to it as “hegemony” or “crush China”. I will argue that, irrespective of one’s ethical or geopolitical preferences, it is fundamentally flawed and against US national security interests.

The entente strategy

Amodei articulates key elements of this strategy as follows:

“a coalition of democracies seeks to gain a clear advantage (even just a temporary one) on powerful AI by securing its supply chain, scaling quickly, and blocking or delaying adversaries’ access to key resources like chips and semiconductor equipment. This coalition would on one hand use AI to achieve robust military superiority (the stick) while at the same time offering to distribute the benefits of powerful AI (the carrot) to a wider and wider group of countries in exchange for supporting the coalition’s strategy to promote democracy (this would be a bit analogous to “Atoms for Peace”). The coalition would aim to gain the support of more and more of the world, isolating our worst adversaries and eventually putting them in a position where they are better off taking the same bargain as the rest of the world: give up competing with democracies in order to receive all the benefits and not fight a superior foe.”

[…]

This could optimistically lead to an ‘eternal 1991’—a world where democracies have the upper hand and Fukuyama’s dreams are realized.”

Note the crucial point about “scaling quickly”, which is nerd-code for “racing to build AGI”. The question of whether this argument for “scaling quickly” is motivated by self-serving desires to avoid regulation deserves a separate analysis, and I will not further comment on it here except for noting that most other industries, from big tobacco to big oil, have produced creative anti-regulation arguments to defend their profits.

Why it’s a suicide race

If the West pursues this entente strategy, it virtually guarantees that China will too, which in turn virtually guarantees that both sides will cut corners on safety to try to “win” the race. The key point I will make is that, from a game-theoretic point of view, this race is not an arms race but a suicide race. In an arms race, the winner ends up better off than the loser, whereas in a suicide race, both parties lose massively if either one crosses the finish line. In a suicide race, “the only winning move is not to play”, as the AI concludes at the end of the movie WarGames.

Why is the entente a suicide race? Because we are closer to building AGI than we are to figuring out how to align or control it.

There is some controversy around how to define AGI. I will stick with the original definition from Shane Legg as AI capable of performing essentially all cognitive tasks at least at human level. This is similar to OpenAI’s stated goal of automating essentially all economically valuable work.

Although it is highly controversial how close we are to AGI, it is uncontroversial that timelines have shortened. Ten years ago, most AI researchers predicted that something as advanced as ChatGPT-4 was decades away. Five years ago, median predictions by AI researchers were that AGI was decades away. The Metaculus prediction for weak AGI has now dropped to 2027. In his influential Situational Awareness piece, Leopold Aschenbrenner argues that AGI by 2027 is strikingly plausible, and Dario Amodei has made a similar prediction. Sam Altman, Demis Hassabis, Yoshua Bengio, Geoff Hinton and Yann LeCun have all recently described AGI in the next 5-15 years as likely. Andrew Ng and many others predict much longer timelines, but we clearly cannot discount the possibility that it happens in the coming decade.  When specifically we get AGI is irrelevant to my argument, which is simply that it will probably happen before the alignment/control problem is solved. Just as it turned out to be easier to build flying machines than to build mechanical birds, it has turned out to be simpler to build thinking machines than to understand and replicate human brains.

In contrast, the challenge of building aligned or controllable AGI has proven harder than many researchers expected, and there is no end in sight. AI Godfather Alan Turing argued in 1951 that “once the machine thinking method had started, it would not take long to outstrip our feeble powers. At some stage therefore we should have to expect the machines to take control.” This sounds like hyperbole if we view AGI as merely another technology, like the steam engine or the internet. But he clearly viewed it more as a new smarter-than-human species, in which case AGI taking over is indeed the default outcome unless some clever scheme is devised to prevent it. My MIT research group has pursued AI safety research since 2017, and based on my knowledge of the field, I consider it highly unlikely that such a clever scheme will be invented in time for AGI if we simply continue “scaling rapidly”.

That is not to say that nobody from big tech is claiming that they will solve it in time. But given the track record of companies selling tobacco, asbestos, leaded gasoline and other fossil fuels downplaying risks of their products, it is prudent to scientifically scrutinize their claims.

The two traditional approaches are either figuring out how to control something much smarter than us via formal verification or other techniques, or do make control unnecessary by “aligning” it: ensuring that it has goals aligned with humanity’s best interests, and that it will retain these goals even if it recursively self-improves its intelligence from roughly human level to astronomically higher levels allowed by the laws of physics.

There has been a major research effort on “alignment” redefined in a much narrower way: ensuring that a large language model (LLM) does not produce outputs deemed harmful, such as offensive slurs or bioweapon instructions. But most work on this has involved only training LLMs not to say bad things rather than not to want bad things. This is like training a hard-core Nazi never to say anything revealing his Nazi views – does this really solve the problem, or simply produce deceptive AI? Many AI systems have already been found to be deceptive, and current LLM blackbox evaluation techniques are likely to be inadequate. Even if alignment can be achieved in this strangely narrowly defined sense, it is clearly a far cry from what is needed: aligning the goals of AGI.

If your reaction is “Machines can’t have goals!”, please remember that if you are chased by a heat-seeking missile, you do not care whether it is “conscious” or has “goals” in any anthropomorphic sense, merely about the fact that it is trying to kill you.

We still lack understanding of how to properly measure or even define what goals are in an LLM: although its training objective is just to predict the next word or token, its success requires accurately modeling the goals of the people who produced the words or tokens, effectively simulating various human goal-oriented behaviors.

As if this were not bad enough, it is now rather obvious that the first AGI will not be a pure LLM, but a hybrid scaffolded system. Today’s most capable AI’s are already hybrids, where LLMs are scaffolded with long-term memory, code compilers, databases and other tools that they can use, and where their outputs are not raw LLM outputs, but rather the result of multiple calls to LLMs and other systems. It is highly likely that this hybridization trend will continue, combining the most useful aspects of neural network-based AI with traditional symbolic AI approaches. The research on how to align or control such hybrid systems is still in an extremely primitive state, where it would be an exaggeration to claim even that it is a coherent research field. “Scaling quickly” is therefore overwhelmingly likely to lead to AGI before anyone figures out how to control or align it. It does not help that the leading AI companies devote much less resources to the latter than to the former, and that many AI safety team members have been resigning and claiming that their company did not sufficiently prioritize the alignment/control problem. Horny couples know that it is easier to make a human-level intelligence than to raise and align it, and it is also easier to make an AGI than to figure out how to align or control it.

If you disagree with my assertion, I challenge you to cite or openly publish an actual plan for aligning or controlling a hybrid AGI system. If companies claim to have a plan that they do not want their competitors to see, I will argue that they are lying: if they lose the AGI race, they are clearly better off if their competitors align/control their AGI instead of Earth getting taken over by unaligned machines. 

Loss-of-control

If you dismiss the possibility that smarter-than-human bots can take over Earth, I invite you to read the work of Amodei, Aschenbrenner and others pushing the “entente” strategy: they agree with me on this, and merely disagree in predicting that they will not lose control. I also invite you to read the arguments for loss-of-control by, e.g., the three most cited AI researchers in history: Geoff Hinton, Yoshua Bengio and Ilya Sutskever. If you downweight similar claims from Sam Altman, Demis Hassabis and Dario Amodei on the grounds that they have an incentive to overhype their technology for investors, please consider that such conflicts of interest do not apply to their investors, to the aforementioned academics, or to the whistleblowers who have recently imperiled their stock options by warning about what their AGI company is doing.

Amodei hopes in his entente manifesto that it will lead to “eternal 1991”. I have argued that it is more likely to lead to “eternal 1984” until the end, with a non-human Big Brother.

There is a small but interesting “replacement” school of thought that agrees that loss-of-control is likely, but views it is as a good thing if humanity loses control and gets fully replaced by smarter-than-human AI, seen as simply the worthy next stage of human evolution. Its prominent supporters include Richard Sutton (“Why shouldn’t those who are the smartest become powerful?”) and Guillaume (“Beff Jezos”) Verdon, who describes himself as a “post-humanist” with the e/acc movement he founded having “no particular allegiance to the biological substrate”. Investor and e/acc-supporter Marc Andreessen writes “We actually invented AI, and it turns out that it’s gloriously, inherently uncontrollable”. Although I respect them as intellectuals, I personally disagree with what I consider an anti-human agenda. I believe that all of humanity should have a say in humanity’s destiny, rather than a handful of tech bros and venture capitalists sealing its fate.

Do you personally want our human species to end during the lifetime of you or some of your loved ones? I predict that if the pro-replacement school ran a global referendum on this question, they would be disappointed by the result.

A better strategy: tool AI

Above I have argued that the “entente” strategy is likely to lead to the overthrow of the US government and all current human power centers by unaligned smarter-than-human bots. Let me end by proposing an alternative strategy, that I will argue is better both for US national security and for humanity as a whole.

Let us define “tool AI” as AI that we can control and that helps us accomplish specific goals. Almost everything that we are currently excited about using AI for can be accomplished with tool AI. Tool AI just won the Nobel Prize for its potential to revolutionize medicine. Tool AI can slash road deaths through autonomous driving. Tool AI can help us achieve the UN Sustainable Development Goals faster, enabling healthier, wealthier and more inspiring lives. Tool AI can help stabilize our climate by accelerating development of better technologies for energy generation, distribution and storage. Today’s military AI is also tool AI, because military leadership does not want to lose control of its technology. Tool AI can help produce an abundance of goods and services more efficiently. Tool AI can help us all be our best through widely accessible customized education.

Like most other human tools, tool AI also comes with risks that can be managed with legally binding safety standards: In the US, drugs can be sold once they meet FDA safety standards, airplanes can be sold once they meet FAA safety standards, and food can be sold in restaurants meeting the standards of municipal health inspectors. To minimize red tape, safety standards tend to be tiered, with little or no regulation on lower-risk tools (e.g. hammers), and more on tools with greater harm potential (e.g. fentanyl).

The US, China and virtually every other country has adopted such safety standards for non-AI tools out of national self-interest, not as a favor to other nations. It is therefore logical for individual countries to similarly adopt national safety standards for AI tools. The reason that AI is virtually the only US industry that lacks national safety standards is not that the US is historically opposed to safety standards, but simply that AI is a newcomer technology and regulators have not yet had time to catch up.

Here is what I advocate for instead of the entente strategy.
The tool AI strategy:  Go full steam ahead with tool AI, 
allowing all AI tools that meet national safety standards.

Once national safety standards were in place for, e.g., drugs and airplanes, national regulators found it useful to confer with international peers, both to compare notes on best practices and to explore mutually beneficial opportunities for harmonization, making it easier for domestic companies to get their exports approved abroad. It is therefore likely that analogous international coordination will follow after national AI safety standards are enacted in key jurisdictions, along the lines of the Narrow Path plan.

What about AGI? AGI does currently not meet the definition of tool AI, since we do not know how to control it. This gives AGI corporations a strong incentive to devote resources to figuring out how it can be controlled/aligned. If/when they succeed in figuring this out, they can make great profits from it. In the mean time, AGI deployment is paused in the same sense as sales are paused for drugs that have not yet been FDA approved.

Current safety standards for potentially very harmful products are quantitative: FDA approval requires quantifying benefit and side effect percentages, jet engine approval requires quantifying the failure rate (currently below 0.0001% per hour) and nuclear reactor approval requires quantifying the meltdown risk (currently below 0.0001% per year). AGI approval should similarly require quantitative safety guarantees. For extremely high-risk technology, e.g., bioengineering work that could cause pandemics, safety standards apply not only to deployment but also to development, and development of potentially uncontrollable AGI clearly falls into this same category.

In summary, the tool AI strategy involves the US, China and other nations adopting AI safety standards purely out of national self-interest, which enables them to prosper with tool AI while preventing their own companies and researchers from deploying unsafe AI tools or AGI. Once the US and China have independently done this, they have an incentive to collaborate not only on harmonizing their standards, but also on jointly strong-arming the rest of the world to follow suit, preventing AI companies from skirting their safety standards in less powerful countries with weak or corrupt governments. This would leave both the US and China (and the rest of the world) vastly wealthier and better off than today, by a much greater factor than if one side had been able to increase their dominance from their current percentage to 100% of the planet. Such a prosperous and peaceful situation could be described as a detente.

In conclusion, the potential of tool AI is absolutely stunning and, in my opinion, dramatically underrated. In contrast, AGI does not add much value at the present time beyond what tool AI will be able to deliver, and certainly not enough value to justify risking permanent loss of control of humanity’s entire future. If humanity needs to wait another couple of decades for beneficial AGI, it will be worth the wait – and in the meantime, we can all enjoy the remarkable health and sustainable prosperity that tool AI can deliver.

Learn more:

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.