AI Consciousness. Me, Myself and AI: The Situational Awareness Dataset for LLMs

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

AI Consciousness. Me, Myself and AI: The Situational Awareness Dataset for LLMs

Rudolf Laine¹, Bilal Chughtai¹, Jan Betley², Kaivalya Hariharan³, Jérémy Scheurer⁴, Mikita Balesni⁴,Marius Hobbhahn⁴, Alexander Meinke⁴, Owain Evans²

¹Independent, ²Constellation, ³MIT, ⁴Apollo Research

SUMMARY OF RESEARCH

AI assistants such as ChatGPT are trained to act like AIs, for example when they say “I am a large language model”. However, do such models really know that they are LLMs and reliably act on this knowledge? Are they aware of their current circumstances, such as whether they are deployed? We refer to a model’s knowledge of itself and its circumstances as situational awareness.

The Situational Awareness Dataset (SAD) quantifies situational awareness in LLMs using a range of behavioral tests. The benchmark comprises 7 task categories, 16 tasks, and over 12,000 questions. Capabilities tested include the ability of LLMs to (i) recognize their own generated text, (ii) predict their own behavior, (iii) determine whether a prompt is from internal evaluation or real-world deployment, and (iv) follow instructions that depend on self-knowledge.

While all models perform better than chance, even the highest-scoring model (Claude 3.5 Sonnet) is far from a human baseline on certain tasks. Performance on SAD is only partially predicted by MMLU score. Chat models, which are finetuned to serve as AI assistants, outperform their corresponding base models on SAD but not on general knowledge tasks.

Situational awareness is important because it enhances a model’s capacity for autonomous planning and action. While this has potential benefits for automation, it also introduces novel risks related to AI safety and control.

LATEST RESULTS

Learn more:

AIs are becoming more self-aware. Here’s why that matters – December 18, 2024
- Benchmarks show that AIs are becoming more self-aware.
- That’s good news for competent AI agents and accurate chatbots.
- And it increases AI’s ability to predict its own behavior.
- But it also creates the potential for AI to act deceptively.
- It’s important we monitor the rate at which AI self-awareness improves, and also systematically explore the associated risks.
- Download PDF

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

AI Consciousness. Me, Myself and AI: The Situational Awareness Dataset for LLMs

Share This Story, Choose Your Platform!