FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

FAQ on Catastrophic AI Risks

I have been hearing many arguments from different people regarding catastrophic AI risks. I wanted to clarify these arguments, first for myself, because I would really like to be convinced that we need not worry. However, sharing them and opening up the discussion more broadly might also be useful.

As a preamble, although I have been interested in the topic for about a decade, I am not an ethics expert. In the past, I have discussed several kinds of negative social impacts and risks of AI, some already concretely producing harms, such as those due to the amplification of discrimination and biases, or the concentration of expertise, power and capital aimed at the development of AI in the hands of a small non-representative group of humans (more often white men with computer science university education from the richest countries of the world), possibly at the expense of many others. See the Montreal Declaration for the Responsible Development of AI, Ruha Benjamin’s book Race after Technology, our recent work with the UN for an overview of AI social impact focused on human rights or Virgina Eubank’s book on Automating Inequality. Concern for these ongoing harms has sometimes been viewed in opposition to concern for catastrophic risks from more advanced AI systems, with the discussion of the latter being a distraction from work on the former. Some of the arguments made below challenge this opposition, suggesting instead that we should promote a regulatory framework addressing all the AI harms and risks and that is focused on human rights at its core. Note that catastrophic harms of concern include not only outcomes in which a large fraction of humans die but also those in which human rights and democracy are severely hurt. See for example my earlier post on scenarios that may yield rogue AIs and a detailed ontology of catastrophic scenarios in this recent paper, many of which go beyond the scenarios evoked below.

Below we call an AI superhuman if it outperforms humans on a vast array of tasks, and we call an AI superdangerous if it is superhuman and would pose a significant threat to humanity if it had goals whose execution could yield catastrophic outcomes. The skills that would make a superhuman superdangerous include strategic reasoning, social persuasion and manipulation, R&D of novel technologies, coding and hacking, etc. It is not necessary in fact for an AI to be supremely intelligent or completely general for it to become a major threat, nor even to surpass humans on every task to be a potential threat, but it should be clear that greater intelligence over more domains increases the risks.

Before delving into these arguments, I found it useful, starting with myself, to go through a “train of thought process”: instead of directly trying to predict the possibility of future catastrophic consequences of AI, it may be useful to ask oneself questions about better-defined events whose sequence could yield catastrophic outcomes, hence the poll below, which I invite everyone, especially those with relevant expertise, to try for themselves. Aggregating the results from various groups of people may also be a useful exercise.

Poll for AI and policy experts

Because there is a lot of uncertainty about the future, it may be useful to consider the diversity of opinions regarding the probability of different events that could ultimately lead to catastrophes for humanity due to rogue AIs. Consider the following 4 statements:

  1. Assuming no structural and relevant regulatory change in our society, within 10 years, we will know how to build a superhuman AI system at a cost affordable to a midsize company.
  1. Assuming no structural and relevant regulatory change in our society and A is true, someone on Earth will intentionally instruct such an AI to achieve something whose consequences would be catastrophic if the AI were successful in achieving its goal.
  1. Assuming no structural and relevant regulatory change in our society and A is true, someone on Earth will unintentionally train or give instructions to such an AI that accidentally make it autonomous and dangerously misaligned (e.g., have a strong self-preservation goal or develop subgoals of its own, which could yield catastrophic outcomes if achieved).
  1. Assuming no structural and relevant regulatory change in our society, even if A happens and then B or C happen, we will be able to protect ourselves from catastrophe using existing defensive measures (e.g., our current laws, police, armies and cyberdefenses will prevent a superhuman AI from wreaking havoc).

Assign a value or a probability distribution over the four probabilities PA, PB, PC and PD (noting that they are all conditional probabilities) that the corresponding statements A, B and C (given A) or D (given A and B or C) are true. Given these four probabilities, we can approximately quantify the risk of catastrophic outcomes with the product PA x (1 – (1 – PB) x (1 –  PC ) ) x (1 – PD ) under the status-quo scenario where we do not take these potential risks seriously sufficiently ahead of time (but restricting access by a factor 1000 would reduce the overall probability by almost as much). Not knowing those probabilities for sure, we should average the above product over their values taken from a distribution, for example obtained through a poll of experts. You may want to redo the poll after reading the discussion below. It should be clear when doing the poll (and thinking through the dialogue below) that it requires background in multiple areas of expertise, not just in AI.

With this in mind, let us now delve into this difficult dialogue and its arguments, in the form of frequent questions and corresponding answers. The questions are asked from the angle of someone who believes that we should not worry about superdangerous AIs while the answers are given from the angle of someone who is concerned about these possibilities. Reflecting on the arguments below, some of the main points in favor of taking this risk seriously can be summarized as follows: (1) many experts agree that superhuman capabilities could arise in just a few years (but it could also be decades) (2) digital technologies have advantages over biological machines (3) we should take even a small probability of catastrophic outcomes of superdangerous AI seriously, because of the possibly large magnitude of the impact (4) more powerful AI systems can be catastrophically dangerous even if they do not surpass humans on every front and even if they have to go through humans to produce non-virtual actions, so long as they can manipulate or pay humans for tasks (5) catastrophic AI outcomes are part of a spectrum of harms and risks that should be mitigated with appropriate investments and oversight in order to protect human rights and humanity, including possibly using safe AI systems to help protect us.

Superdangerous AI

Q: Current state-of-the-art AI systems are far from human intelligence, they are missing fundamental pieces, they don’t have any intentions and it may take decades or centuries before we bridge that gap, if we ever manage to at all, given the complex and insufficiently understood nature of intelligence.

A: I agree that we are missing fundamental pieces, but there are massive financial resources poured into AI, recently yielding an unexpected rapid acceleration of the competence of AI systems, especially in the mastery of language and the ability to capture understanding at an intuitive (i.e. system 1) level. Research on bridging the gap to superhuman capabilities is making progress, for example to improve system 2 abilities (reasoning, world model, causality, epistemic uncertainty estimation). If we are lucky, you are right and these projects to design superhuman AI may take many more decades, giving us more time to prepare and adapt, but it is also quite possible that the current proposals to introduce system 2 abilities to deep learning will lead to radically improved performance in a matter of a few years. My current estimate places a 95% confidence interval for the time horizon of superhuman intelligence at 5 to 20 years. We take actions to minimize future risks like pandemics, even in the presence of uncertainty about the timing.  AI systems with intentions and goals already exist: this is what most reinforcement learning (RL) systems have, specified via a reward function and sometimes even goals specified in natural language. As to whether human-level or superhuman AI is even possible, I would strongly argue that there is a scientific consensus that brains are biological machines and that there is no evidence of inherent impossibility of building machines at least as intelligent as us. Finally, an AI system would not need to be better than us on all fronts in order to have a catastrophic impact (even the least intelligent entity, a virus, could destroy humanity).

Q: In the process of research, we sometimes have the impression that we are making progress towards the main obstacle in front of us, that we are about to reach the top of the mountain (the challenge we are facing), but what often happens is that we realize later that there is another obstacle, another mountain that we could not see before reaching the top of this one. Why would it be different this time? There are still several open questions in AI research (such as hierarchical RL and system 2 deep learning) suggesting that simply scaling and engineering won’t be sufficient to reach human-level intelligence.

A: Very true, but my concern does not rest on the assumption that scaling and engineering will suffice. What also heavily colors this question in my mind is risk and its magnitude. Maybe there is a major obstacle towards superhuman AIs that we do not yet see. Or maybe not. It is very difficult to know, but what is sure is that billions of dollars are currently invested to accelerate the advances in AI capabilities, because of the success of ChatGPT. Faced with that uncertainty, the magnitude of the risk of catastrophes or worse, extinction, and the fact that we did not anticipate the rapid progress in AI capabilities of recent years, agnostic prudence seems to me to be a much wiser path. All of the open research questions you mention are actively being investigated. What if they pan out in the next few years?

Q: Since we do not understand yet exactly what a superhuman AI would look like, it is a waste of time to try to prevent such unknown risks. Could we have figured airplane safety rules before the Wright brothers? Let’s fix the problems with very powerful AI systems when we will understand them better.

A: I used to think exactly like this, thinking that superhuman intelligence was still far in the future, but ChatGPT and GPT-4 have considerably reduced my prediction horizon (from 20 to 100 years to 5 to 20 years). With over 100 million users, we are well past the Wright brothers stage. These LLMs have also given us pretty good clues of what an AI could already do and what it is missing, and several research groups are working on these shortcomings. The unexpected speed at which LLMs have acquired their current level of competence simply because of scale suggests that we could also see the rest of the gap being filled in just a few years with minor algorithmic changes. Even if someone disagrees with the temporal horizon distribution, I don’t see how one could reject that possibility. I acknowledge your argument that it is difficult to come up with regulation and countermeasures for something that does not yet exist. However, there are examples of proposals to control dangerous technologies (including atomic power in the 1910s and AI in this century, or biological agents regulated with a global regime that is agnostic to the exact pathogens that could be used) that have been made that did not rely on knowing the exact form of the technology. The other important element here is the slowness of adaptation of society, not to mention of governments to implement policies and regulations. I believe that we should study and evaluate preventative measures that we could take as a society to reduce those risks and gradually prepare countermeasures, and we should get started as soon as possible. Generic policies, like monitoring and capability evaluation, licensing, reporting requirements  and auditing of dangerous technologies, are applicable for all technologies. See also this discussion on the diversity of actions one should consider to mitigate catastrophic AI risks. Our lack of understanding and visibility of harm scenarios indeed poses difficult dilemmas regarding regulation, though (e.g. see the Collingridge dilemma). Finally, going back to how a superhuman AI might look like, there is already a working hypothesis: take the current generative AI architectures and train them (as inference machines, see this blog post) with system 2 machinery and objectives (which admittedly need to be scaled up) so that they can also reason better, be more coherent and imagine plans and counterfactuals. It would still be a big neural net trained with some objective function and some procedure for generating examples (not just the observed data). We now have a lot of experience with such systems, and there are many open research questions about how to make them safe and trustworthy.

A: In addition, even if we do not fully master all the principles that explain our own intelligence (i.e. systems 1 and 2), digital computing technology can bring additional advantages over biological intelligence. For example, computers can parallelize learning across many machines thanks to high-bandwidth communication enabling them to exchange trillions of model parameters, while humans are limited to exchanging information at the rate of a few bits per second via language. As a result, computers can learn from much larger datasets (e.g. reading the whole internet) which is infeasible for humans in-lifetime – see Geoff Hinton’s arguments along this line, especially starting around 21m37s.

A: Finally, even if an AI is not stronger than humans on all cognitive abilities, it could still be dangerous if the aspects it masters (e.g. language but not robotics) are sufficient to wreak havoc, for example, using dialogue with humans to create a manipulative emotional connection and pay or influence them to act in the world in ways that could be very harmful, starting with destabilizing democracy even more than current social media. We know that at least a subset of humans are very suggestible and can believe, for example, conspiracy theories with a conviction that is greatly out of proportion to their evidence. In addition, organized crime is likely to execute well-paid tasks without even knowing that they are being paid by an AI.

Genocidal humans and the danger of very powerful technologies

Q: There already are plenty of dangerous technologies, and humanity has survived (probably for good reasons, including our ability to adapt to danger), so why would it be different with AI?

A: Firstly, note that survival of humanity is a low bar; there are many examples of widespread harm enabled by powerful technologies (nuclear detonations, general weaponry use, chemical pollution, political polarization, racial discrimination) where our species survived (sometimes with close calls) but that are sufficiently serious to warrant preventative measures. Secondly, AI has attributes that make it particularly risky amongst technological innovations. The probability of catastrophic outcomes of a technology depends on a combination of many factors. These include the level of power of the technology, its autonomy and agency as well as its accessibility (how many people could use it). Taking nuclear technology as a comparison, it is not easy to successfully get your hands on nuclear material and the equipment to turn them into high-impact bombs. Operation of nuclear weapons is tightly controlled and accessible to very few people, whereas barriers to hacking computers are lower and more difficult to enforce; anybody can download software made available on the internet or use an API, without generally needing licensing or ethical certification. The development of natural language interfaces such as with ChatGPT means that one can give instructions to an AI system without even having to know how to program. Power enhances the dangers enabled by high accessibility – as our technologies grow more powerful, so too does the danger of wielding them. A similar paradigm is playing out in synthetic biology: with commercialization it has become easier for individuals to order new proteins or microbes bearing new DNA that would be difficult for a biologist to assess with respect to bioweapon potential. Finally, superhuman AI is a special category in the sense that we have never built technology smarter than us which could itself create even smarter versions of itself. Because AI systems are already capable of acting competently to achieve goals that do not correspond to human intentions (i.e. the AI alignment problem), autonomous superhuman AI systems have the potential to be superdangerous, in ways that previous technologies weren’t and that are inherently difficult to predict (because it is difficult to predict the behavior of entities much smarter than us). And turning a non-autonomous AI system like ChatGPT into one that has agency and goals can be done easily, as has been shown with Auto-GPT. Although our society already has self-protection mechanisms (e.g. against crime), they have been developed to defend against humans, and it is not clear how strongly they would hold against stronger forms of intelligence.

Q: Why would anyone in their right mind ask a computer to destroy humanity or a part of it or the foundations of our civilization?

A: History is full of cases of humans doing terrible things, including genocides or starting wars that end up killing a significant fraction of people in their camp. Humanity has proven itself to be highly capable of both malevolence and irrationality. There are many examples of game-theoretical dilemmas where individual incentives are not well aligned with the global welfare (e.g., in an arms race, or in the competition between companies leading to reduced safety for increased performance) by lack of an adequate coordination mechanism. I am not reassured at all: although some or even a majority of humans may be compassionate and with high ethical standards, it is sufficient that a few of them with violent or misguided intentions get access to very dangerous technology for major harm to follow. Chaos-GPT illustrated (as a joke for now) that one could then just instruct the AI to destroy humanity. Granted, and thankfully, the current level of competence of AI would not allow it (yet) to wreak havoc, but what about 5 or 10 years from now?

Q: I would instead argue that AI is not only already beneficial, it can bring immense benefits to humanity in the future, including to help us defend against criminal uses of AI and rogue AIs.

A: I agree that more powerful AI can be immensely useful but with this power also comes the possibility of more dangerous uses and thus a greater level of responsibility to avoid harm. In fact, existing AI systems (that are not superhuman and not general-purpose) are already safe (but not always fair and accurate) and can still be very useful. To benefit from the upside of more advanced AI, we need to reduce the downside risks: we have done that with other technologies in the past. I also agree that we could use AI systems to defend against misused or rogue AI systems. But to do this, we probably need safe and aligned AI in the first place and we need to massively grow R&D in these areas. Such good AIs could also help us mount more robust defenses against attack vectors, e.g., via pathogen detection, climate and biodiversity stability modeling, information ecosystem monitoring, cybersecurity, fraud tracking, etc. But I would not trust this alone to be a silver bullet protection: we need to reduce the risks on all fronts where we can, after evaluating the pros and cons of any preventative measure.

Q: Limiting access to superhuman AIs could have a negative side-effect in terms of reducing our freedoms but also may hurt our ability to fight a possible rogue AI thanks to the diversity of safe AIs (which hopefully would be in majority since accidents and nefarious people would be the exception rather than the rule).

A: I agree there are trade-offs but we have faced similar ones for other dangerous technologies. I believe that superhuman AI should not be used and developed by everyone (like for nuclear technology, guns and planes), that the governance of superhuman AI should be made by a broad and representative group of stakeholders with the well-being of all of humanity as a goal, and that the profits from AI should be redistributed for the benefit of all, all of which require strong democratic institutions.

A: More specifically, we only need to limit access to superhuman AI systems that are not demonstrably safe. When they are safe, they can help defend against rogue AIs. But while they are unsafe, this seems rather unwise. I agree that there are trade-offs and I agree that having a large and diverse set of safe and beneficial AIs of comparable intelligence should help us to counter a rogue AI. However, the scenario I am most concerned with is when someone finds an algorithmic improvement which, when scaled up with the kind of massive training set and computation resources we already see, yields a major jump in intelligence, either much above human intelligence or much above the existing AI systems. There is always a first time for things like this, and at that moment, I surmise that the handlers of this superior AI system will have something like dynamite in their hands. They better be people with high ethical standards who have been trained to follow very rigorous procedures (so that for example it is not a single human but a committee which takes the important decisions about what to ask to the AI in its initial tests), in a way analogous to how we handle nuclear bombs and large quantities of nuclear material.  In general, I am concerned with the speed at which the intelligence of AI systems could grow. If it is slow enough, then humans and our social organization have a chance to adapt and mitigate the risks. If it is too quick, the danger of mishaps increases greatly. Reducing access would indeed slow things down, but this may be a good thing.  I believe that the safest path is to put the development of the most powerful AI systems in the hands of international organizations that are not furthering the interests of a single company or country but instead seek humanity’s welfare.

AI Alignment

Q: If we can build one or more superhuman AIs and instruct it to not harm humanity, it should be able to understand us, and thus our needs and values, which means that the AI alignment problem is a non-problem.

A: I wish you were right, but over a decade of research on AI alignment and reinforcement learning as well as in economics leaves us with not much in terms of reassuring results, especially given the high stakes involved. Even if a superdangerous AI will understand what we want, this doesn’t mean that it will do what we want it to do. A fundamental issue is that it is difficult to make sure that AI systems understand our intentions and moral values. Even doing it among humans is difficult: societies have tried to do something like that with legal systems, but they are clearly imperfect, with corporations finding loopholes all the time. It thus appears very difficult to guarantee that what we ask of the machine is really what it understands it should do. As an illustration, see the 1970 science-fiction movie, Colossus: The Forbin Project, or Stuart Russell’s Human Compatible book and his example of fossil fuels companies, that have been deceiving humanity for decades and bringing about massive harm (and yet much more to come) in pursuit of their profit objective. The recent use of reinforcement learning to fine-tune LLMs makes the AI try to please and convince AI annotators, not necessarily tell the truth, which may even lead to them using deception to obtain rewards or provide unfaithful explanations. However, if we are willing to let go of agency of AI systems, I am quite confident that we could build superhuman AI oracles that are useful and safe because they would have no agency, no autonomy, no goals and no self or self-preservation intention. Still, conceptually it would not be difficult to write a wrapper around such a system that yields an autonomous (and thus potentially dangerous) AI that uses the oracle to figure out how to achieve its goals. This is exactly what Auto-GPT did, with ChatGPT as the oracle. Thankfully, this is not yet dangerous because ChatGPT is not smarter than us (although, like a savant, it knows more facts than any of us). Hence, it is not sufficient to have a recipe for building a safe and useful AI, we also need the political and social environment to minimize the risk of someone not following those guidelines.

Q: I am pretty sure that in order to build aligned AI systems, it is enough to provide them with an objective or reward function that specifies what we want or design them at our image.

A: There is a general agreement in the reinforcement learning (e.g. see these examples from DeepMind), economics and AI safety communities that doing this is very difficult and amplified when the AI system tries to optimize a reward function that seemed like a good measure of what we care about before we used the AI to optimize for it (Goodhart’s law), and even arguments that we may never be able to do it anywhere close to perfectly (starting with the fact that even among humans we do not agree about this nor how to formalize this). We already have misalignment between how we wish our current AI systems would behave and how they do, for example with respect to biases and discrimination or because of distributional shifts due to changes in the world. Furthermore, a slight misalignment between our actual intentions and what the AI system actually sees as a quantified objective is likely to be amplified by the difference in power or intelligence between the AI and us. Such differences among humans do not generally have such drastic consequences because in comparison most humans have comparable levels of intellect: we can see that when some humans have hugely more power than others, it may be really end up poorly for those with much less power – and the union of many weaker humans (e.g. democracy) makes it possible to introduce a balancing force against the more powerful ones. By analogy, more powerful corporations are more able to find loopholes in the laws than less well-resourced ones and change the laws themselves thanks to lobbying. If we design AI systems at our image, it means that they will certainly have a self-preservation goal, which amounts to creating a new species because they won’t be exactly like us. Those differences and misalignment could end up drastically dangerous for humanity, just like the differences in goals between us and the species we drove extinct.

Q: Some argue that you cannot separate the intelligence machinery from the goals and thus swap in and out any kind of goal, and thus you could not have a goal that is in contradiction with the basic instructions to not harm humans.

A: It is generally true for humans that there are goals (like compassion) that we cannot easily swap out. On the other hand, there are plenty of examples of humans (a minority, thankfully) who can ignore our compassion instinct. In addition, humans are extremely good at taking on new goals. This is how companies work, this is how researchers work, this is how politicians work, etc. Finally, although we cannot play with our own evolutionary programming easily, AI researchers routinely change the goals of learning machines: this is how reinforcement learning works and why a machine can be made to focus entirely on winning a game like the game of Go. Finally, there is the problem I alluded to above that humans might provide the nefarious goals, or simply impose another goal (like military victory) in which avoiding harm to humans is not an overriding imperative. In fact, the means of specifying ‘real-world’ constraints like harm to humans is an unsolved research challenge. Harming humans can then become a side-effect of another, higher-priority goal. Stuart Russell gave the example of gorillas that we are driving to extinction, not because we have killing them as an explicit goal, but as an unintended side effect of more pressing goals (like profit).

Q: What about air gapping to prevent the AI system from directly acting in the world?

A: Lots of thought has gone into this sort of solution and it might be part of the spectrum of mitigating measures (although none seems to be a silver bullet, as far as I can see). The problem with air gapping is that we still need some kind of dialogue between the AI system and its human operators, and humans can be influenced. By default, companies are incentivized to deploy their systems widely, to reap profits. With ChatGPT, the ship has sailed and the interface is used by hundreds of millions. Others let Auto-GPT act independently on the internet. Air gapping would also require to make sure the code and parameters of the AI systems do not leak, won’t be stolen and that even bad actors follow the same security procedures, which speaks for strong public policies, including at the international level.

Q: I do not think that we have solved the problem of training AI systems so that they would autonomously come up with their own subgoals, especially non-obvious misaligned ones.

A: You are right that hierarchical reinforcement learning is a very active area of research where many questions remain, but the algorithms we currently have can already figure out subgoals, even if they are not optimal. In addition, subgoals can emerge implicitly, as seems to happen with GPT-4. Research is needed to develop tools which can detect, evaluate and scrutinize the implicit goals and subgoals of AI systems, or build AI systems that are useful but cannot have any external goal, implicitly or explicitly.

Q: Why would superhuman AIs necessarily have survival and domination instincts like us and have goals that could lead to our extinction? We could just program them to be tools, not living things.

A: If we are not sufficiently careful, creating superhuman AIs may turn out like creating a new species, which I argue would them them into superdangerous AIs. Our own evolutionary and recent history shows that smarter species can inadvertently act in ways that lead to the extinction of less smart species (other hominids, plus over 900 species extinct in the last 500 years). How do we make sure or know for sure that once the recipe for creating such superhuman AIs, no one will program it with a survival goal? The other concern is that, as discussed in the AI safety literature, the self-preservation objective may emerge as a convergent instrumental goal needed to achieve almost any other goal. Other emergent convergent goals include the objectives to acquire more power and control (i.e., dominate us) as well as to become smarter and acquire more knowledge. All of these goals tend to be useful subgoals for a vast number of other goals. We should certainly do our best to program AIs to be behave in ways what would not hurt us, maybe following the Human Compatible approach, but if AI are agents, i.e., they have implicit or explicit goals (even starting with those we give them), it is not yet clear how we could guarantee alignment. Alternatively, we could design AI systems that basically are just tools: their objective could be to understand the world but without having any goals nor direct plans or actions in the real world except answering questions that are probabilistically truthful  to their understanding of the world, in the sense of approximating the Bayesian posteriors over answers given the question and the available data.  More research is needed on these topics, as well as how to organize society to make sure that the safety guidelines we find are indeed followed all over the world.

In other words: this may be a good idea, but nobody knows how to reliably achieve this yet – it’s an open research problem.

Q: “If you realize it’s not safe, you just don’t build it”

A: Unfortunately, humans are not always wise, they may be greedy or malicious or have very false beliefs, as demonstrated many times in history. In addition, they may not realize that it is not safe, and make an unwitting but grave mistake, or they may take excessively serious risks. An interesting example was the decision to try out the first atomic bomb test (Trinity, 1945) in spite of uncertainty about a chain reaction that was thought could have ignited the atmosphere.

Q: If we realize it is dangerous, we can just unplug the AI!

A: It would be great if we could but either by design or because of the AI’s own (perhaps instrumental) self-preservation goal or because of the incentives of humans involved, there are many factors that would make our ability to unplug the AI difficult. See Oliver Sourbut’s overview of these unpluggability challenges, which he groups along the factors such as these: rapidity of the gain in power of the AI, imperceptibility of these gains in power, robustness to unplugging attempts due to redundancy (software is very easily copied), self-replication abilities (not just of the AI but also of the attack vectors, like bioweapons or computer viruses) and our dependency (or the dependency of some of us, who may therefore be motivated to resist unplugging attempts) on the services rendered by the AI systems.

Many AI Risks 

Q: Putting the emphasis on existential risks is likely to remove attention from the current harms of AI and from marginalized voices who speak about the ongoing injustices associated with AI and other technologies.

A: This is a very important point. Many of us in the AI community have been advocates of AI regulation and AI ethics centered on social impact for many years (see our early work on the Montreal declaration for the responsible use of AI for example) and we do need to work on the current harms and risks to democracy and human rights as well. I do not think that it is an either/or choice: Should we ignore future sea level rises from climate change because climate change is already causing droughts? In fact, what is needed on the path to address all AI risks is much greater governance, monitoring and regulation, with human rights and democracy (in the true sense of the word, of power to the people rather than power concentration in a few hands) at the center of the stage. Let’s get that started and accelerated on the required reforms, making sure to bring all the voices to the required discussions. In fact, what I see unfolding with the current media attention given to AI existential risk is an acceleration of the political discussion on the need for AI governance and regulation, which is helping the cause of addressing current AI harms more than any other previous attempt, e.g., as seen by the recent declarations from Joe Biden and Rishi Sunak. Additionally, there is a great overlap in the technical and political infrastructure required to mitigate both fairness harms from current AI and catastrophic arms feared from more powerful AI, i.e., having regulation, oversight, audits, tests to evaluate potential harm, etc. Finally, at a technical level, many of the current harms and concerns (like discrimination and bias, or the concentration of power in a few companies) belong to the larger concern about alignment: we build AI systems and the corporations around them whose goals and incentives may not be well aligned with the needs and values of society and humanity.

Q: It seems to me that in order to be rational about the various risks, we need to weigh them by their uncertainty, and those that are further into the future or involving scenarios we cannot clearly model should be greatly downweighted in our decision making. And since extinction scenarios are also extremely uncertain, they should basically be ignored.

A: It is true that risks should be weighted by their uncertainty, and this is one reason why I care so much about the current harms of AI as well as the importance of current human misery that AI could already help us reduce. But one should also take into account, in this repugnant harm calculation, the magnitude of the possible harms. If a fraction of humanity dies or much worse, if the human species goes completely extinct, the magnitude of the harm is enormous and many experts believe that the chance of this scale of impact is far from negligible justifying our attention and preventative measures. Additionally, there is a difference between ‘unlikely’ and ‘uncertain’: when a scenario seems broadly plausible but the details are uncertain, the appropriate response is to invest deliberation into how we can clarify the details (and thereby learn how to address them), not to dismiss the scenario out of hand.

Q: I believe that AI-driven extinction is very unlikely or too uncertain while overreaction to the fear of extinction could yield other kinds of catastrophic consequences, such as populist authoritarian governments using AI to install a Big Brother society supposedly to make sure no one triggers an AI-driven extinction, for example with everyone being watched by the government’s AI with cameras around your neck and every keyboard being monitored.

A: We clearly need to work hard to avoid that Big Brother scenario. To clarify, I believe that protecting human rights and democracy is necessary in order to ultimately minimize the AI existential risks. An authoritarian government tends to care first and foremost about its power and does not have the checks and balances necessary to always take wise decisions (nor to put much weight on the well-being of those not in power). It can easily entertain strong and false beliefs (e.g., that the clique in power will be protected from possible mishaps with AI) that can lead to catastrophic decisions. And of course, democracy and human rights are core values to uphold. So even before we get to superhuman AI systems, we also need to worry about near-term AI destabilizing democracy through disinformation and manipulating humans through language, dialogue (possibly creating intimacy, as noted by Yuval Harari) and social media. We absolutely need to ban the counterfeiting of human identities as severely as we ban the counterfeiting of money, we need to identify machine-generated content as such, force in-person registration for any kind of internet account providing agency to the user, etc. I believe that doing all this would protect democracy and also reduce AI existential risk.

Openness and Democracy

Q: The discussions on existential risks are likely to bring about actions that contradict our human values, human rights, democracy, open science and open source, which both of us hold as dear.

A: We need to preserve and even enhance democracy and human rights while reducing the catastrophic AI risks. A big, diverse group of people should be involved in making decisions as to what AI systems should be developed, how they will be programmed and what safety checks to run. To achieve this, we need regulation and policy expertise now. All humans should reap ultimately the profits from AI production. However, that does not mean that everyone would be allowed to own it. Regarding open source, Geoff Hinton said “How do you feel about the open-source development of nuclear weapons?”. Many people outside the US for instance also believe that weapon ownership does not further democratic ideals. I understand your concern, especially in the light of some earlier proposals to manage existential risks with a Big Brother society. We need to resist the temptation of authoritarianism. I am convinced there are other and in fact safer paths. I believe that we need to find ways to continue the progress of science and technology in all the areas that do not endanger the public and society, and that means sharing results, code, etc, but we also need to increase monitoring, governance and oversight where human actions could yield rogue AI systems or any other scientific activity with potentially dangerous impact. This is exactly why we need ethics in science and why we have ethics boards in universities.

A: There are lots of precedents of impactful research and technology that are closely monitored while delivering benefits to society. We already make compromises in our society between individual freedom and privacy on one hand and protecting the public on the other, for example most countries regulate weapons, governments monitor significant flows of money, and some scientific areas are also under greater scrutiny and limitations, such as human cloning and genetic design and nuclear material. We can have oversight and monitoring of potentially dangerous activities in a democratic society, without having a Big Brother government. Most AI applications and systems are beneficial and not creating catastrophic risks, and we should in fact accelerate the development of AI for social good applications. Specialized AI systems are much safer by nature (they do not have a big picture understanding of how the world works, of humans and society, i.e., they can make mistakes but we are unlikely to lose our control over them) and they can render immense services. The idea of an AI Scientist can be applied to specialized domains, for example.

Q: What you are suggesting would hurt AI open science and open source, and may thus slow us down in developing the kind of good AI that could help us fight the rogue AIs that may emerge anyways from organizations and countries that cheat with international treaties or simply don’t sign them. And governments will not accept that superhuman AI assistants whose design is hidden from them are delivered to their citizens.

A: These are important points. It may be a good idea to invest a lot more in AI safety both in the sense of ‘how do we build safe AI systems’ and in the sense of ‘how do we build safe AI systems that will help us counter possible actions of rogue AI systems’. We clearly need to better understand the specific risks, such as rogue AI systems developing biological weapons (and how do we make it more difficult to order synthetic biology products without being properly registered as a trusted human, for example) or cybersecurity risks (current defenses are meant for single-piece-of-code attacks – where that piece of code is carefully crafted by a human, not for a vast diversity of pieces of code being simultaneously launched in AI-driven attacks, for example). At the same time, in order to reduce the probability of someone intentionally or unintentionally bringing about a rogue AI, we need to increase governance and we should consider limiting access to the large-scale generalist AI systems that could be weaponized, which would mean that the code and neural net parameters would not be shared in open-source and some of the important engineering tricks to make them work would not be shared either. Ideally this would stay in the hands of neutral international organizations (think of a combination of IAEA and CERN for AI) that develop safe and beneficial AI systems that could also help us fight rogue AIs. Reducing the flow of information would slow us down, but rogue organizations developing potentially superdangerous AI systems may also be operating in secret, and probably with less funding and fewer top-level scientists. Moreover, governments can help monitor and punish other states who start undercover AI projects. Governments could have oversight on a superhuman AI without that code being open-source. To minimize associated risks, we would also need international agreements with real teeth. Finally, we need to prepare for the eventuality that in spite of regulation and treaties, someone will create a rogue AI, and one tricky form of protection is to design (under the auspices of an international organization and with appropriate security measures) a safe superhuman AI that could help protecting us from the actions of rogue AIs.

Desperation, hope and moral duty

Q: The cat is out of the bag, the toothpaste out of the tube, so it seems to me that it is too late to stop the development of superhuman AI. Governments are too slow to legislate, not to mention international treaties. And regulation is always imperfect and slows down innovation. Instead, I believe that we should accelerate the development of AI, which will bring a new age of enlightenment and well-being for all of humanity.

A: Even if the odds look bad, it is worth it to continue acting towards minimizing harm and maximizing well-being. Look at climate activists, who have good reasons to feel desperate. They keep going because even though harm is already happening, and it would have been better to act earlier, future harms can still be reduced. I believe that regulation, treaties and societal reforms that may help us control the catastrophic risks due to AI are in fact necessary for humanity to benefit from AI and bring the age of enlightenment and well-being that you envision. It is not sufficient to simply hope that all will go well: better safe than sorry.

Q: Isn’t all this discussion about superhuman AI just hype that serves the interests of a clique of AI experts and a small group of companies? Current AI systems, even GPT-4, are not that impressive, with many flaws having been pointed out.

A: I am hoping that the above discussion clarified the possible reasons for concern. I am of course not completely sure that superhuman AI is just a few years away. It could still be decades in the future. I actually hope so. But based on the rate of recent progress and my knowledge of ongoing research, there is a significant non-zero probability that the recipe for superhuman AI will build on what we have already discovered and that the missing pieces (which I believe are mostly system 2 abilities) will be uncovered within the next decade, as suggested by the distribution of responses to the poll from the AI researchers I consulted. Over 100 professors signed the recent statement on AI risk. That being said, we have to be careful that our preventative actions and policies will be oriented towards the empowerment and well-being of all humans and not magnify an already unfair concentration of power, e.g., in the hands of a few companies.

What can we conclude from this dialogue?

Please redo the poll to estimate the probability of the events leading to catastrophic outcomes. Have they changed?

Going through these arguments leaves me even more convinced that, precisely because of our disagreements about the future of AI, we need to chart a plot that embraces all those possibilities. It also means that all AI risks, including AI safety, require more attention, investments (in both technical and policy research) and both national and international regulatory bodies working for the common good (not leaving it to commercial entities and individual governments or their military arms to self-regulate). It is essential to reduce the uncertainty about scenarios and the effect of counter-measures, and this requires a major socio-technical research investment. We need to better anticipate and detail the possibly dangerous scenarios and elaborate appropriate policies to minimize these risks while balancing partially conflicting objectives (like speeding up progress on developing powerful and useful AI technology vs. limiting its ability to harm humans). In spite of odds that may seem discouraging (in the face of past and current attempts at international coordination regarding global risks), our individual moral duty is to invest more thoughts, care, and actions in directions that balance the minimization of future harms with societal development and advancement.
Acknowledgments: Yoshua Bengio thanks Niki Howe, Stuart Russell, Philippe Beaudoin, Andrew Critch, Jan Brauner, Xu Ji, Joseph Viviano, Konrad Körding, Charlotte Siegman, Eric Elmoznino, Sasha Luccioni, Andrew Jesson, Pablo Lemos, Edward Hu, Shahar Avin, Dan Hendrycks, Alex Hernandez-Garcia, Oly Sourbut, Nasim Rahaman, Fazl Barez, Edouard Harris and Michal Koziarski for feedback on the draft of this text.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

2023-10-05T10:02:49+00:00June 24, 2023|
Go to Top