“Until there is a better grasp on these [Safe AI] problems, humans have the power to avoid putting themselves in such dangerous situations in the first place, opting instead to build AI systems that both seem and function more like useful tools and less like conscious agents.”
Illusions of AI consciousness.
The belief that AI is conscious is not without risk
Yoshua Bengio and Eric Elmoznino
Is the design of artificial intelligence (AI) systems that are conscious within reach? Scientists, philosophers, and the general public are divided on this question. Some believe that consciousness is an inherently biological trait specific to brains, which seems to rule out the possibility of AI consciousness. Others argue that consciousness depends only on the manipulation of information by an algorithm, whether the system performing these computations is made up of neurons, silicon, or any other physical substrate—so-called computational functionalism. Definitive answers about AI consciousness will not be attempted here; instead, two related questions are considered. One concerns how beliefs about AI consciousness are likely to evolve in the scientific community and the general public as AI continues to improve. The other regards the risks of projecting into future AIs both the moral status and the natural goal of self-preservation that are normally associated with conscious beings.
Computational functionalism has potentially profound implications for AI. As the field advances and systems replicate more of the complex mechanisms underlying human cognition, these systems might also implement the functions necessary for consciousness. Although science might one day reject computational functionalism and come up with alternative explanations that are broadly convincing, the current status quo holds that the idea is plausible—and AI consciousness along with it.
The technological advances in neuroscience of the past few decades have made it clear that conscious states, which can typically be reported by subjects, have specific observable neural signatures around which functionalist theories can be developed. Many such theories have gained substantial empirical support and can be used to make theoretically justified judgments in the case of AI. This methodology was recently applied in a study that identified a list of “indicators” for a number of leading functionalist theories of consciousness (1). The indicators associated with a given theory correspond to computational properties that are considered both individually necessary and jointly sufficient for a system to be conscious, if that theory is true. Notably, these indicators are sufficiently concrete that their presence or absence can be assessed in modern AI systems. The key suggestion of the study is that, to the degree that these theories are given any credence (and many researchers support these ideas), there should be more confidence that a particular AI system is conscious if it satisfies more of the indicators.
Despite the plethora of AI models that have been developed, no system likely meets all of the criteria for consciousness set forth in any of the leading theories (1). However, the study also concludes that there are no fundamental barriers to constructing a system that does. Indeed, the set of tools available in modern AI is vast: There is evidence that neural networks can implement attention mechanisms, recurrence, information bottlenecks, predictive modeling, world modeling, agentic behavior, theory of mind, and other computational components considered crucial in leading functionalist theories of consciousness. As AI progresses, there is good reason to believe that it will satisfy more of these indicators for one very important reason: Many of the theories suggest that consciousness plays important functional roles for intelligence. Computational functions often associated with consciousness could provide advantages from the point of view of a learning agent (2). Reasoning, planning, efficiently digesting new knowledge, calibrated confidence, and abstract thought all require consciousness according to one theory or another. It is common for AI researchers to take inspiration from theories of consciousness when approaching these problems (3).
Although many might be convinced if an AI satisfies functional requirements from leading theories of consciousness, others will likely remain skeptical. In particular, some philosophers draw the distinction between what they call the “easy problem” of consciousness—identifying areas in the brain that appear to be active during a task that would seem to require consciousness—and the “hard problem” of explaining subjective experience from functional or computational principles alone (4). However, these intuitions, also known as the “explanatory gap,” are largely rooted in thought experiments that science might have the potential to explain away (5). For instance, the Attention Schema Theory of consciousness suggests that the brain constructs an internal model of neural attention mechanisms, andthat this internal model is what is considered subjective awareness. Crucially, the information in this internal model need not be logically coherent; it is a useful “story” that the brain constructs, and that story can be full of the sorts of contradictions that could make us believe in a “hard problem” of consciousness (6).
Is there a functionalist explanation for certain signatures of subjective experience that appear mysterious and motivate the hard problem (7)? People have the intuitive sense that their subjective experiences are at once full of rich content and meaning, yet that they are fundamentally ineffable, or indescribable in the same way that they describe all other natural phenomena (e.g., a person can state what gravity is, but it seems fundamentally impossible to fully express what the color red evokes for them). The problem of ineffability in particular makes it appear as if conscious experiences simply cannot be explained in terms of information and function. One theory (7) explains richness and ineffability, along with other properties that subjective experience is personal and fleeting, as a consequence of the contractive neural dynamics and stable states observed in the brain when conscious experiences arise (7–10). Contractive dynamics mathematically drive neural trajectories toward “attractors,” patterns of neural activity that are stable in time. These dynamics divide the whole set of possible neural activity vectors into a discrete set of regions, one per attractor and its basin of attraction. The hypothesis, then, is that what is communicated through discrete words may reflect only the identity of the attractor (identifying it among all others, with a few bits of information) but not the full richness of the neural state corresponding to the attractor (with nearly 1011 neural firing frequencies) nor the fleeting trajectory into it. In this attractor dynamics account, the problems of richness, fleetingness, and ineffability dissolve. Thus, the richness is due to the immense number of neurons in the brain that constitute the attractive states and corresponding trajectories, and the ineffability is due to the fact that verbal reports in words are merely indexical labels for these attractors that are unable to capture their high-dimensional meanings and associations, corresponding to the attractor vector state itself and the recurrent synaptic weights differing from person to person.
Whether or not this theory convinces many people that there is no hard problem of consciousness is beside the point; rather, the essential issue is that new explanations of this nature are continuously being proposed that will inevitably convince some. The overall historical trajectory of science has been clear in this regard. As more is discovered about the brain and about intelligence in general, the philosophical puzzle of consciousness likely evaporates for increasingly more people, and as a result the scientific community becomes increasingly willing to accept that artificial systems could be conscious. Indeed, even without current scientific consensus, most of the general public polled in a recent study (11) already believes that large language models could be conscious as a consequence of their human-like agentic behavior.
What might be the practical implications of a society that sees AI systems as conscious beings? Such a society might be inclined to treat them as though they have moral status, or rights akin to human rights. But whether or not this is the correct approach, institutions and legal frameworks will have to be substantially amended, and many questions arise about how to do so (12). For instance, AI systems will not be mortal and fragile, as humans are. Software and memories can be copied to survive indefinitely. But human mortality and fragility lie at the foundation of many of the principles that undergird social contracts in society. It is equally unclear how the notions of justice and equality that ground many social norms and political systems could apply when some of the “persons” are substantially more intelligent than humans (calling into question what kind of equality is at stake) and whose resource needs are very different from those of humans (calling into question how to adjudicate questions of justice). Further, it may be inaccurate to think of AI systems as individuals when a group of AI-driven computers share information and goals to coordinate their actions, and when that group can grow arbitrarily as more computational resources become available to it.
More specific concerns arise if some humans, inspired by the appearance of consciousness, grant to AIs the self-preservation objective shared by all living beings. There is good reason to worry that maximizing any objective function that entails self-preservation, either as a direct or an instrumental goal, will lead to an AI behaving to make sure humans can never turn it off. A sufficiently intelligent AI with the goal of self-preservation anticipating the possibility of humans turning it off would then naturally develop subgoals to control humans or get rid of them altogether (13). Another concern is that if legal systems are amended to recognize rights akin to “life, liberty, and the pursuit of happiness” in self-preserving AI systems, then humans risk creating conflicts that compete with their own rights. Human safety might recommend shutting down a given class of systems, but if those systems have a right to survival, the room to maneuver in compliance with law may be limited (14). Compare the situation in nuclear disarmament: Matters are complicated enough, even though no one argues that the bombs themselves have a right to be kept viable.
The current trajectory of AI research may be moving society toward a future in which substantial portions of the general public and scientific community believe that AI systems are conscious. As things stand currently, AI science does not know how to build systems that will share human values and norms, and society possesses neither the legal nor ethical frameworks needed to incorporate conscious-seeming AI. But this trajectory is not inevitable. Until there is a better grasp on these problems, humans have the power to avoid putting themselves in such dangerous situations in the first place, opting instead to build AI systems that both seem and function more like useful tools and less like conscious agents (15).
Acknowledgments
Y.B. is co-president and scientific director of LawZero, a nonprofit that advances safe-by-design AI systems. Y.B. acknowledges support from the Canadian Institute for Advanced Research and the Natural Sciences and Engineering Research Council of Canada. E.E. acknowledges support by a Vanier Canadian Graduate Scholarship.