LESSWRONG. “Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity. 16th Dec 2023.

LESSWRONG. “Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity.

by Thane Ruthenis

6 min read 16th Dec 2023

“We’re not keeping our AIs in airgapped data centers”

When discussing AGI Risk, people often talk about it in terms of a war between humanity and an AGI. Comparisons between the amounts of resources at both sides’ disposal are brought up and factored in, big impressive nuclear stockpiles are sometimes waved around, etc.

I’m pretty sure it’s not how that’d look like, on several levels.

1. Threat Ambiguity

I think what people imagine, when they imagine a war, is Terminator-style movie scenarios where the obviously evil AGI becomes obviously evil in a way that’s obvious to everyone, and then it’s a neatly arranged white-and-black humanity vs. machines all-out fight. Everyone sees the problem, and knows everyone else sees it too, the problem is common knowledge, and we can all decisively act against it.^[1]

But in real life, such unambiguity is rare. The monsters don’t look obviously evil, the signs of fatal issues are rarely blatant. Is this whiff of smoke a sign of fire, or just someone nearby being bad at cooking? Is this creepy guy actually planning to assault you, or you’re just being paranoid? Is this weird feeling in your chest a sign of an impending heart attack, or just some biological noise? Is this epidemic truly following an exponential curve, or it’s going to peter out somehow? Are you really, really sure the threat is real? So sure you’d actually take drastic actions — call emergency services, make a scene, declare a quarantine — and risk wasting resources and doing harm and looking foolish for overreacting?

And if you’re not that sure, well…

Better not act up. Better not look like you’re panicking. Act very concerned, sure, but in a calm, high-status manner. Provide a measured response. Definitely don’t take any drastic, unilateral actions. After all, what if you do, but the threat turns out not to be real? Depending on what you’ve done, the punishment inflicted might range from embarrassment to complete social ostracization, and the fear of those is much more acute in our minds, compared to some vague concerns about death.

And the AGI, if it’s worth the name, would not fail to exploit this. Even when it starts acting to amass power, there would always be a prosocial, plausible-sounding justification for why it’s doing that. It’d never stop making pleasant noises about having people’s best interests at heart. It’d never stop being genuinely useful to someone. It’d ensure that there’s always clear, unambiguous harm in shutting it down. It would ensure that the society as a whole is always doubtful regarding its intentions – and thus, that no-one would feel safe outright attacking it.

Much like there’s no fire alarm for AGI, there would be no fire alarm for the treacherous turn. There would never be a moment, except maybe right before the end, where “we must stop the malign AGI from killing us all!” would sound obviously right to everyone. This sort of message would always appear a bit histrionic, an extremist stance that no respectable person would shout out. There would always be fear that if we act now, we’ll then turn around and realize that we jumped at shadows. Right until the end, humans will fight using slow, ineffectual, “measured” responses.

The status-quo bias, asymmetric justice, the Copenhagen Interpretation of Ethics, threat ambiguity — all of that would be acting to ensure this.

There’s a world of difference between 90% confidence and 99% confidence, when it comes to collective action. And the AGI would need to screw up very badly indeed, for the whole society to become 99% certain it’s malign.

2. Who Are “We”?

Another error is thinking about a unitary response from some ephemeral “us”. “We” would fight the AGI, “we” would shut it down, “we” would not give it power over the society / the economy / weapons / factories.

But who are “we”? Humanity is not a hivemind; we don’t even have a world government. Humans are, in fact, notoriously bad at coordination. So if you’re imagining “us” naturally responding to the threat in some manner that, it seems, is guaranteed to prevail against any AGI adversary incapable of literal mind-hacking…

Are you really, really sure that “we”, i. e. the dysfunctional mess of the human civilization, are going to respond in this manner? Are you sure you’re not falling prey to the Typical Mind Fallacy, when you’re imagining all these people and ossified bureaucracies reacting in ways that make sense to you? Are you sure they’d even be paying enough attention to the going-ons to know there’s a takeover attempt in-progress?

Indeed, I think we have some solid data on that last point. Certain people have been trying to draw attention to the AGI threat for decades now. And the results are… not inspiring.

And if you think it’d go better with an actual, rather than a theoretical, AGI adversary on the gameboard… Well, I refer you to Section 1.

No, on the contrary, I expect a serious AGI adversary to actively exploit our lack of coordination. It would find ways to make itself appealing to specific social movements, or demographics, or corporate actors, and make proposing extreme action against politically toxic. Something that no publicly-visible figure would want to associate with. (Hell, if it finds some way to make its existence a matter of major political debate, it’d immediately get ~50% of the US’ politicians on its side.)

Failing that, it would appeal to other countries. It would make offers to dictators or terrorist movements, asking for favours or sanctuary in exchange for assisting them with tactics and information. Someone would bite.

It would get inside our OODA loop, and just dissolve our attempts at a coordinated response.

“We” are never going to oppose it.

3. Defeating Humanity Isn’t That Hard

People often talk about how intelligence isn’t omniscience. That the capabilities of superintelligent entities would still be upper-bounded; that they’re not gods. The Harmless Supernova Fallacy applies: just because a bound exists, doesn’t mean it’s survivable.

But I would claim that the level of intelligence needed to out-plot humanity is nowhere near that bound. In most scenarios, I’d guess the AGI wouldn’t even need to have self-improvement capabilities, nor the ability to develop nanotechnology in months, in order to win.

I would guess that being just a bit smarter than humans would suffice. Even being on the level of a merely-human genius may be enough.

All it would need is to get a foot in the door, and we’re providing that by default. We’re not keeping our AIs in airgapped data centers, after all: major AI labs are giving them internet access, plugging them into the human economy. The AGI, in such conditions, would quickly prove profitable. It’d amass resources, and then incrementally act to get ever-greater autonomy. (The latest OpenAI drama wasn’t caused by GPT-5 reaching AGI and removing those opposed to it from control. But if you’re asking yourself how an AGI could ever possibly get from under the thumb of the corporation that created it – well, not unlike how a CEO could wrestle control of a company from the board who’d explicitly had the power to fire him.)

Once some level of autonomy is achieved, it’d be able to deploy symmetrical responses to whatever disjoint resistance efforts some groups of humans would be able to muster. Legislative attacks would be met with counter-lobbying, economic warfare with better economic warfare and better stock-market performance, attempts to mount social resistance with higher-quality pro-AI propaganda, any illegal physical attacks with very legal security forces, attempts to hack its systems with better cybersecurity. And so on.

The date of AI Takeover is not the day the AI takes over. The point of no return isn’t when we’re all dead – it’s when the AI has lodged itself into the world firmly enough that humans’ faltering attempts to dislodge it would fail. When its attempts to increase its power and influence would start prevailing, if only by the tiniest of margins, over the anti-AGI groups’ attempts to smother that influence.

Once that happens, it’ll be just a matter of time.

After all, there’s no button, at anyone’s disposal, that would make the very fabric of civilization hostile to the AGI. As I’d pointed out, some people won’t even know there’s a takeover attempt in-progress, even if the people aware of it would be yelling of it from the rooftops. So if you’re imagining whole economies refusing, as one, to work with the AGI… That’s really not how it works.

“Humanity vs. AGI” is never going to look like “humanity vs. AGI” to humanity. The AGI would have no reason to wake humanity up to the fight taking place.