FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Power Seeking (AI) – Lesswrong

Power Seeking is a property that agents might have, where they attempt to gain more general ability to control their environment. It’s particularly relevant to AIs, and related to Instrumental Convergence.

Instrumental convergence in single-agent systems. Summary of the sequence. Lesswrong. 12 OCT 2022

Over the past few months, we’ve been investigating instrumental convergence in reinforcement learning agents. We started from the definition of single-agent POWER proposed by Alex Turner et al., extended it to a family of multi-agent scenarios that seemed relevant to AI alignment, and explored its implications experimentally in several RL environments.

The biggest takeaways are:

  1. Alignment of terminal goals and alignment of instrumental goals are sharply different phenomena, and we can quantify and visualize each one separately.
  2. If two agents have unrelated terminal goals, their instrumental goals will tend to be misaligned by default. The agents in our examples tend to interact competitively unless we make an active effort to align their terminal goals.
  3. As we increase the planning horizon of our agents, instrumental value concentrates into a smaller and smaller number of topologically central states — for example, positions in the middle of a maze.

Overall, our results suggest that agents that aren’t competitive with respect to their terminal goals, nonetheless tend on average to become emergently competitive with respect to how they value instrumental states (at least, in the settings we looked at). This constitutes direct experimental evidence for the instrumental convergence thesis.

We’ll soon be open-sourcing the codebase we used to do these experiments. We’re hoping to make it easier for other folks to reproduce and extend them. If you’d like to be notified when it’s released, email Edouard at edouard@gladstone.ai, or DM me here or on Twitter at @harris_edouard.

Edouard Harris – New Research: Advanced AI may tend to seek power *by default*. 22 OCT 2023.

What does power seeking really mean? And does all this imply for the safety of future, general-purpose reasoning systems? Edouard Harris, an AI alignment researcher and one of Jeremie’s co-founders of the AI safety company (Gladstone AI) comes back on the TDS Podcast to discuss AI’s potential ability to seek power. Intro music: ➞ Artist: Ron Gelinas ➞ Track Title: Daybreak Chill Blend (original mix) ➞ Link to Track:    • Ron Gelinas – Day…   0:00 Intro 4:00 Alex Turner’s research 7:45 What technology wants 11:30 Universal goals 17:30 Connecting observations 24:00 Micro power seeking behaviour 28:15 Ed’s research 38:00 The human as the environment 42:30 What leads to power seeking 48:00 Competition as a default outcome 52:45 General concern 57:30 Wrap-up

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.