FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Natural Selection Favors AIs over Humans. Dan Hendrycks. Center for AI Safety. 06 MAY 2023

Abstract

For billions of years, evolution has been the driving force behind the development of life, including humans. Evolution endowed humans with high intelligence, which allowed us to become one of the most successful species on the planet. Today, humans aim to create artificial intelligence systems that surpass even our own intelligence. As artificial intelligences (AIs) evolve and even- tually surpass us in all domains, how might evolution shape our relations with AIs? By analyzing the environment that is shaping the evolution of AIs, we argue that the most successful AI agents will likely have undesirable traits. Competitive pressures among corporations and militaries will give rise to AI agents that automate human roles, deceive others, and gain power. If such agents have intelligence that exceeds that of humans, this could lead to humanity losing control of its future. More abstractly, we argue that natural selection operates on systems that compete and vary, and that selfish species typically have an advantage over species that are altruistic to other species. This Darwinian logic could also apply to artificial agents, as agents may eventually be better able to persist into the future if they behave selfishly and pursue their own interests with little regard for humans, which could pose catastrophic risks. To counteract these risks and evolutionary forces, we consider interventions such as carefully designing AI agents’ intrinsic motivations, introducing constraints on their actions, and institutions that encourage cooperation. These steps, or others that resolve the problems we pose, will be necessary in order to ensure the development of artificial intelligence is a positive one.1

A.4 Executive Summary

Artificial intelligence is advancing quickly. In some ways, AI development is an uncharted frontier, but in others, it follows the familiar pattern of other competitive processes; these include biological evolution, cultural change, and competition between businesses. In each of these, there is significant variation between individuals and some are copied more than others, with the result that the future population is more similar to the most copied individuals of the earlier generation. In this way, species evolve, cultural ideas are transmitted across generations, and successful businesses are imitated while unsuccessful ones disappear.

This paper argues that these same selection patterns will shape AI development and that the features that will be copied the most are likely to create an AI population that is dangerous to humans. As AIs become faster and more reliable than people at more and more tasks, businesses that allow AIs to perform more of their work will outperform competitors still using human labor at any stage, just as a modern clothing company that insisted on using only manual looms would be easily outcompeted by those that use industrial looms. Companies will need to increase their reliance on AIs to stay competitive, and the companies that use AIs best will dominate the marketplace. This trend means that the AIs most likely to be copied will be very efficient at achieving their goals autonomously with little human intervention.

A world dominated by increasingly powerful, independent, and goal-oriented AIs is dangerous. Today, the

most successful AI models are not transparent, and even their creators do not fully know how they work or what they will be able to do before they do it. We know only their results, not how they arrived at them. As people give AIs the ability to act in the real world, the AIs’ internal processes will still be inscrutable: we will be able to measure their performance only based on whether or not they are achieving their goals. This means that the AIs humans will see as most successful — and therefore the ones that are copied — will be whichever AIs are most effective at achieving their goals, even if they use harmful or illegal methods, as long as we do not detect their bad behavior.

In natural selection, the same pattern emerges: individuals are cooperative or even altruistic in some situations, but ultimately, strategically selfish individuals are best able to propagate. A business that knows how to steal trade secrets or deceive regulators without getting caught will have an edge over one that refuses to ever engage in fraud on principle. During a harsh winter, an animal that steals food from others to feed its own children will likely have more surviving offspring. Similarly, the AIs that succeed most will be those able to deceive humans, seek power, and achieve their goals by any means necessary.

If AI systems are more capable than we are in many domains and tend to work toward their goals even if it means violating our wishes, will we be able to stop them? As we become increasingly dependent on AIs, we may not be able to stop AI’s evolution. Humanity has never before faced a threat that is as intelligent as we are or that has goals. Unless we take thoughtful care, we could find ourselves in the position faced by wild animals today: most humans have no particular desire to harm gorillas, but the process of harnessing our intelligence toward our own goals means that they are at risk of extinction, because their needs conflict with human goals.

This paper proposes several steps we can take to combat selection pressure and avoid that outcome. We are optimistic that if we are careful and prudent, we can ensure that AI systems are beneficial for humanity. But if we do not extinguish competition pressures, we risk creating a world populated by highly intelligent lifeforms that are indifferent or actively hostile to us. We do not want the world that is likely to emerge if we allow natural selection to determine how AIs develop. Now, before AIs are a significant danger, is the time to begin ensuring that they develop safely.

Conclusion

At some point, AIs will be more fit than humans, which could prove catastrophic for us since a survival-of-the- fittest dynamic could occur in the long run. AIs very well could outcompete humans, and be what survives. Perhaps altruistic AIs will be the fittest, or humans will forever control which AIs are fittest. Unfortunately, these possibilities are, by default, unlikely. As we have argued, AIs will likely be selfish. There will also be substantial challenges in controlling fitness with safety mechanisms, which have evident flaws and will come under intense pressure from competition and selfish AIs.

The scenario where AIs pose risks is not mere speculation. Since evolution by natural selection is assured given basic conditions, this leaves only a question of evolutionary pressure’s intensity, rather than whether catastrophic risk factors will emerge at all. The intensity of evolutionary pressure will be high if AIs adapt rapidly—these rapidly accumulating changes can make evolution happen more quickly and increase evolutionary pressure. Similarly, the intensity of evolutionary pressure will be high if there will be many varied AIs or if there will be intense economic or international competition. Since high evolutionary pressure is plausible, AIs would plausibly be less influenced by human control, more ‘wild’ and influenced by the behavior of other AIs, and more selfish.

The outcome of human-AI coevolution may not match hopeful visions of the future. Granted, humans have experienced co-evolving with other structures that are challenging to influence, such as cultures, governments, and technologies. However, these have never been able to seize control of the broader world’s evolution before. Worse, unlike technology and government, the evolutionary process can go on without us; as humans become less and less needed to perform tasks, eventually nothing will really depend on us. There is even pressure to make the process free from our involvement and control. The outcome: natural selection gives rise to AIs that act as an invasive species. This would mean that the AI ecosystem stops evolving on human terms, and we would become a displaced, second-class species.

Natural selection is a formidable force to contend with. Now that we are aware of this larger evolutionary process, however, it is possible to escape and thwart Darwinian logic. To meet this challenge, we offer three practical suggestions. First, we suggest supporting research on AI safety. While no safety technique is a silver bullet, together they can help shape the composition of the evolving population of AI agents and cull unsafe AI agents. Second, looking to the farther future, we advocate avoiding giving AIs rights for the next several decades and avoid building AIs with the capacity to suffer or making them worthy of rights. It is possible that someday we could share society with AIs equitably, yet by prematurely circumscribing limitations on our ability to influence their fitness, we will likely enter a no-win situation. Finally, biology reminds us that the threat of external dangers can provide the impetus for cooperation and lead individuals to set aside their differences. We therefore strongly urge corporations and nations developing AIs to recognize that AIs could pose a catastrophic threat and engage in unprecedented multi-lateral cooperation to extinguish competitive pressures. If they do not, economic and international competition would be the crucible that gives rise to selfish AIs, and humanity would act on the behalf of evolutionary forces and potentially play into its hands.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.