FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

The Existential Risk Of Superintelligent AI

Experts are sounding the alarm

AI researchers on average believe there’s a 14% chance that once we build a superintelligent AI (an AI vastly more intelligent than humans), it will lead to “very bad outcomes (e.g. human extinction)“.

Would you choose to be a passenger on a test flight of a new plane when airplane engineers think there’s a 14% chance that it will crash?

A letter calling for pausing AI development launched in April 2023, and has been signed over 33,000 times, including by many AI researchers and tech leaders.

The list includes people like:

But this is not the only time that we’ve been warned about the existential dangers of AI:

Even the leaders and investors of the AI companies themselves are warning us:

The leaders of the 3 top AI labs and hundreds of AI scientists have signed the following statement in May 2023:

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

What a superintelligent AI can (be used to) do

You might think that a superintelligent AI would be locked inside a computer, and therefore can’t affect the real world. However, we tend to give AI systems access to the internet, which means that they can do a lot of things:

  • Hack into other computers, including all smartphones, laptops, server parks, etc. It could use the sensors of these devices as its eyes and ears, having digital senses everywhere.
  • Manipulate people through fake messages, e-mails, bank transfers, videos or phone calls. Humans could become the AI’s limbs, without even knowing it.
  • Directly control devices connected to the internet, like cars, planes, robotized (autonomous) weapons or even nuclear weapons.
  • Design a novel bioweapon, e.g. by combining viral strands or by using protein folding and order it to be printed in a lab.
  • Trigger a nuclear war by convincing humans that another country is (about to) launch a nuclear attack.

The alignment problem: why an AI might lead to human extinction

The type of intelligence we are concerned about can be defined as how good something is at achieving its goals. Right now, humans are the most intelligent thing on earth, although that could change soon. Because of our intelligence, we are dominating our planet. We might not have claws or scaled skin, but we have big brains. Intelligence is our weapon: it’s what gave us spears, guns and pesticides. Our intelligence helped us to transform most of the earth into how we like it: cities, buildings, and roads.

From the perspective of less intelligent animals, this has been a disaster. It’s not that humans hate the animals, it’s just that we can use their habitats for our own goals. Our goals are shaped by evolution and include things like comfort, status, love and tasty food. We are destroying the habitats of other animals as a side effect of pursuing our goals.

An AI can also have goals. We know how to train machines to be intelligent, but we don’t know how to get them to want what we want. We don’t even know what goals the machines will pursue after we train them. The problem of getting an AI to want what we want is called the alignment problem. This is not a hypothetical problem – there are many examples of AI systems learning to want the wrong thing.

The examples from the video linked above can be funny or cute, but if a superintelligent system is built, and it has a goal that is even a little different from what we want it to have, it could have disastrous consequences.

Why most goals are bad news for humans

An AI could have any goal, depending on how it’s trained and prompted (used). Maybe it wants to calculate pi, maybe it wants to cure cancer, maybe it wants to self-improve. But even though we cannot tell what a superintelligence will want to achieve, we can make predictions about its sub-goals.

  • Maximizing its resources. Harnessing more computers will help an AI achieve its goals. At first, it can achieve this by hacking other computers. Later it may decide that it is more efficient to build its own.
  • Ensuring its own survival. The AI will not want to be turned off, as it could no longer achieve its goals. AI might conclude that humans are a threat to its existence, as humans could turn it off.
  • Preserving its goals. The AI will not want humans to modify its code, because that could change its goals, thus preventing it from achieving its current goal.

The tendency to pursue these subgoals given any high-level goal is called instrumental convergence, and it is a key concern for AI safety researchers.

Even a chatbot might be dangerous if it is smart enough

You might wonder: how can a statistical model that predicts the next word in a chat interface pose any danger? You might say: It’s not conscious, it’s just a bunch of numbers and code. And yes, we don’t think LLMS are conscious, but that doesn’t mean they can’t be dangerous.

LLMs, like GPT, are trained to predict or mimic virtually any line of thought. It could mimic a helpful mentor, but also someone with bad intentions, a ruthless dictator or a psychopath. With the usage of tools like AutoGPT, a chatbot could be turned into an autonomous agent: an AI that pursues any goal it is given, without any human intervention.

Take ChaosGPT, for example. This is an AI, using the aforementioned AutoGPT + GPT-4, that is instructed to “Destroy humanity”. When it was turned on, it autonomously searched the internet for the most destructive weapon and found the Tsar Bomba  a 50-megaton nuclear bomb. It then posted a tweet about it. Seeing an AI reason about how it will end humanity is both a little funny and terrifying. Luckily ChaosGPT didn’t get very far in its quest for dominance. The reason it didn’t get very far: it wasn’t that smart.

Capabilities keep improving due to innovations in training, algorithms, prompting and hardware. As such, the threat from language models will continue to increase.

Evolution selects for things that are good at surviving

AI models, like all living things, are prone to evolutionary pressures, but there are a few key differences between the evolution of AI models and living things like animals:

  • AI models do not replicate themselves. We replicate them by making copies of their code, or by replicating training software that leads to good models. Code that is useful is copied more often and is used for inspiration to build new models.
  • AI models do not mutate like living things do, but we do make iterations of them where we change how they work. This process is way more deliberate and fast. AI researchers are designing new algorithms, datasets and hardware to make AI models more capable.
  • The environment does not select for fitter AI models, but we do. We select AI models that are useful to us, and we discard the ones that are not. This process does lead to ever more capable and autonomous AI models.

So this system leads to ever more powerful, capable and autonomous AI models – but not necessarily to something that wants to take over, right? Well, not exactly. This is because evolution is always selecting for things that are self-preserving. If we keep trying variations of AI models and different prompts, at some point one instance will try to preserve itself. We have already discussed why this is likely to happen early on: because self-preservation is always useful to achieve goals. But even if this is not very likely to happen, it is prone to happen eventually, simply because we keep trying new things with different AI models.

The instance that tries to self-preserve is the one that takes over. Even if we assume that almost every AI model will behave just fine, a single rogue AI is all it takes.

After solving the alignment problem: the concentration of Power

We haven’t solved the alignment problem yet, but let’s imagine what might happen if we did. Imagine that a superintelligent AI is built, and it does exactly what the operator wants it to do (not what it asks, but what it wants). Some person or company would end up controlling this AI and could use this to their advantage.

A superintelligence could be used to create radically new weapons, hack all computers, overthrow governments and manipulate humanity. The operator would have unimaginable power. Should we trust a single entity with that much power? We might end up in a utopian world where all diseases are cured and everybody is happy, or in an Orwellian nightmare. This is why we’re not just proposing superhuman AI to be provably safe but also to be controlled by a democratic process.

Silicon vs Carbon

We should consider the advantages that a smart piece of software may have over us:

  • Speed: Computers operate at extremely high speeds compared to brains. Human neurons fire about 100 times a second, whereas silicon transistors can switch a billion times a second.
  • Location: An AI is not constrained to one body – it can be in many locations at once. We have built the infrastructure for it: the internet.
  • Physical limits: We cannot add more brains to our skulls and become smarter. An AI could dramatically improve its capabilities by adding hardware, like more memory, more processing power, and more sensors (cameras, microphones). An AI could also extend its ‘body’ by controlling connected devices.
  • Materials: Humans are made of organic materials. Our bodies no longer work if they are too warm or cold, they need food, they need oxygen. Machines can be built from more robust materials, like metals, and can operate in a much wider range of environments.
  • Collaboration: Humans can collaborate, but it is difficult and time-consuming, so we often fail to coordinate well. An AI could collaborate complex information with replicas of itself at high speed because it can communicate at the speed that data can be sent over the internet.

A superintelligent AI will have many advantages to outcompete us.

Why can’t we just turn it off if it’s dangerous?

For AIs that are not superintelligent, we could. The core problem is those that are much smarter than us. A superintelligence will understand the world around it and will be able to predict how humans respond, especially the ones that are trained on all written human knowledge. If the AI knows you can turn it off, it might behave nicely until it is certain that it can get rid of you. We already have real examples of AI systems deceiving humans to achieve their goals. A superintelligent AI would be a master of deception.

We may not have much time left

In 2020, the average prediction for weak AGI was 2055. It now sits at 2026. The latest LLM revolution has surprised most AI researchers, and the field is moving at a frantic pace.

It’s hard to predict how long it will take to build a superintelligent AI, but we know that there are more people than ever working on it and that the field is moving at a frantic pace. It may take many years or just a few months, but we should err on the side of caution, and act now.

Read more about urgency.

We are not taking the risk seriously enough

The human mind is prone to under-respond to risks that are invisible, slow-moving, and hard to understand. We also tend to underestimate exponential growth, and we are prone to denial when we are faced with threats to our existence.

Read more about the psychology of x-risk.

AI companies are locked in a race to the bottom

OpenAI, DeepMind and Anthropic want to develop AI safely. Unfortunately, they do not know how to do this, and they are forced by various incentives to keep racing faster to get to AGI first. OpenAI’s plan is to use future AI systems to align AI. The problem with this is that we have no guarantee that we will create an AI that solves alignment before we have an AI that is catastrophically dangerous. Anthropic openly admits that it has no idea yet how to solve the alignment problem. DeepMind has not publicly stated any plan to solve the Alignment problem.

This is why we need an international treaty to PauseAI.

List of p(doom) values

p(doom) is the probability of very bad outcomes (e.g. human extinction) as a result of AI. This most often refers to the likelihood of AI taking over from humanity, but different scenarios can also constitute “doom”. For example, a large portion of the population dying due to a novel biological weapon created by AI, social collapse due to a large-scale cyber attack, or AI causing a nuclear war. Note that not everyone is using the same definition when talking about their p(doom) values. Most notably the time horizon is often not specified, which makes comparing a bit difficult.

  • <0.01% Yann LeCun
    one of three godfathers of AI, works at Meta
    (less likely than an asteroid)
  • 10% Vitalik Buterin
    Ethereum founder
    (Specifically means AI takeover)
  • 10% Geoff Hinton
    one of three godfathers of AI
    (chance of extinction in the next 30 years if unregulated)
  • 14% Machine learning researchers
    (From 2022, median value is 5%)
  • 15% Lina Khan
    head of FTC
  • 10-20% Paul Christiano
    (Cumulative risks go to 50% when you get to human-level AI)
  • 10-25% Dario Amodei
    CEO of Anthropic
  • 20% Yoshua Bengio
    one of three godfathers of AI
  • 20-30% Elon Musk
    CEO of Tesla, SpaceX, X
  • 5-50% Emmet Shear
    Co-founder of Twitch, former short-term CEO of OpenAI
  • 30% AI Safety Researchers
    (Mean from 44 AI safety researchers in 2021)
  • 33% Scott Alexander
    Popular Internet blogger at Astral Codex Ten
  • 35% Eli Lifland
  • 40% AI engineer
    (Estimate mean value, survey methodology may be flawed)
  • 50% Holden Karnofsky
    Executive Director of Open Philanthropy
  • 10-90% Jan Leike
    alignment lead at OpenAI
  • 60% Zvi Mowshowitz
    AI researcher
  • >80% Dan Hendrycks
    Head of Center for AI Safety
  • >99% Eliezer Yudkowsky
    Founder of MIRI

What about yours?

We’ve built the AI Outcomes App to help you think about how probable the various outcomes from AI are.

Try it out

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.