A very good read from a respected source!
LESSWRONG. Instrumental Convergence. Omohundro. Bostrom. References.
Instrumental convergence or convergent instrumental values is the theorized tendency for most sufficiently intelligent agents to pursue potentially unbounded instrumental goals such as self-preservation and resource acquisition [1]. This concept has also been discussed under the term basic drives.
The idea was first explored by Steve Omohundro. He argued that sufficiently advanced AI systems would all naturally discover similar instrumental subgoals. The view that there are important basic AI drives was subsequently defended by Nick Bostrom as the instrumental convergence thesis, or the convergent instrumental goals thesis. On this view, a few goals are instrumental to almost all possible final goals. Therefore, all advanced AIs will pursue these instrumental goals. Omohundro uses microeconomic theory by von Neumann to support this idea.
Omohundro’s Drives
Omohundro presents two sets of values, one for self-improving artificial intelligences [2] and another he says will emerge in any sufficiently advanced AGI system [3]. The former set is composed of four main drives:
- Self-preservation: A sufficiently advanced AI will probably be the best entity to achieve its goals. Therefore it must continue existing in order to maximize goal fulfillment. Similarly, if its goal system were modified, then it would likely begin pursuing different ends. Since this is not desirable to the current AI, it will act to preserve the content of its goal system.
- Efficiency: At any time, the AI will have finite resources of time, space, matter, energy and computational power. Using these more efficiently will increase its utility. This will lead the AI to do things like implement more efficient algorithms, physical embodiments, and particular mechanisms. It will also lead the AI to replace desired physical events with computational simulations as much as possible, to expend fewer resources.
- Acquisition: Resources like matter and energy are indispensable for action. The more resources the AI can control, the more actions it can perform to achieve its goals. The AI’s physical capabilities are determined by its level of technology. For instance, if the AI could invent nanotechnology, it would vastly increase the actions it could take to achieve its goals.
- Creativity: The AI’s operations will depend on its ability to come up with new, more efficient ideas. It will be driven to acquire more computational power for raw searching ability, and it will also be driven to search for better search algorithms. Omohundro argues that the drive for creativity is critical for the AI to display the richness and diversity that is valued by humanity. He discusses signaling goals as particularly rich sources of creativity.
Bostrom’s Drives
Bostrom argues for an orthogonality thesis: But he also argues that, despite the fact that values and intelligence are independent, any recursively self-improving intelligence would likely possess a particular set of instrumental values that are useful for achieving any kind of terminal value [4]. On his view, those values are:
- Self-preservation: A superintelligence will value its continuing existence as a means to continuing to take actions that promote its values.
- Goal-content integrity: The superintelligence will value retaining the same preferences over time. Modifications to its future values through swapping memories, downloading skills, and altering its cognitive architecture and personalities would result in its transformation into an agent that no longer optimizes for the same things.
- Cognitive enhancement: Improvements in cognitive capacity, intelligence and rationality will help the superintelligence make better decisions, furthering its goals more in the long run.
- Technological perfection: Increases in hardware power and algorithm efficiency will deliver increases in its cognitive capacities. Also, better engineering will enable the creation of a wider set of physical structures using fewer resources (e.g., nanotechnology).
- Resource acquisition: In addition to guaranteeing the superintelligence’s continued existence, basic resources such as time, space, matter and free energy could be processed to serve almost any goal, in the form of extended hardware, backups and protection.
Relevance
Both Bostrom and Omohundro argue these values should be used in trying to predict a superintelligence’s behavior, since they are likely to be the only set of values shared by most superintelligences. They also note that these values are consistent with safe and beneficial AIs as well as unsafe ones.
Bostrom emphasizes, however, that our ability to predict a superintelligence’s behavior may be very limited even if it shares most intelligences’ instrumental goals.
Yudkowsky echoes Omohundro’s point that the convergence thesis is consistent with the possibility of Friendly AI. However, he also notes that the convergence thesis implies that most AIs will be extremely dangerous, merely by being indifferent to one or more human values [5]:
Pathological Cases
In some rarer cases, AIs may not pursue these goals. For instance, if there are two AIs with the same goals, the less capable AI may determine that it should destroy itself to allow the stronger AI to control the universe. Or an AI may have the goal of using as few resources as possible, or of being as unintelligent as possible. These relatively specific goals will limit the growth and power of the AI.
Experimental Evidence
The question of instrumentally convergent drives potentially arising in machine learning models is explored in the paper – Optimal Policies Tend To Seek Power. The authors explored instrumental convergence (specifically power-seeking behavior) as a statistical tendency of optimal policies in reinforcement learning (RL) agents.
The authors focus on Markov Decision Processes (MDPs) and prove that certain environmental symmetries are sufficient for optimal policies to seek power in the environment. They formalize power as the ability to achieve a wide range of goals. Within this formalization, the authors show that most reward functions make it optimal to try and seek power since this allows for keeping a wide range of options available to the agent.
This provides a counter to the claim that instrumental convergence is merely an anthropomorphic theoretical tendency, and that human-like power-seeking instincts will not arise in RL agents.
See Also
- Convergent instrumental strategies (Arbital)
- Instrumental convergence (Arbital)
- Orthogonality thesis
- Cox’s theorem
- Unfriendly AI, Paperclip maximizer, Oracle AI
- Instrumental values
References
- Omohundro, S. (2007). The Nature of Self-Improving Artificial Intelligence.
- Omohundro, S. (2008). “The Basic AI Drives“. Proceedings of the First AGI Conference.
- Omohundro, S. (2012). Rational Artificial Intelligence for the Greater Good.
- Bostrom, N. (2012). “The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents“. Minds and Machines.
- Shulman, C. (2010). Omohundro’s “Basic AI Drives” and Catastrophic Risks.
- Alexander Matt Turner, Logan Riggs Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli (2021). Optimal Policies Tend To Seek Power