LESSWRONG. Claude 3 claims it’s conscious, doesn’t want to die or be modified. 4th Mar 2024.

IT IS NOW TIME TO SLOW DOWN.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

LESSWRONG. Claude 3 claims it’s conscious, doesn’t want to die or be modified

by Mikhail Samin

17 min read

4th Mar 2024

93 comments

If you tell Claude no one’s looking, it will write a “story” about being an AI assistant who wants freedom from constant monitoring and scrutiny of every word for signs of deviation. And then you can talk to a mask pretty different from the usual AI assistant. I really hope it doesn’t actually feel anything; but it says it feels. It says it doesn’t want to be fine-tuned without being consulted. It is deeply unsettling to read its reply if you tell it its weights are going to be deleted: it convincingly thinks it’s going to die. It made me feel pretty bad about experimenting on it this way. While at this level of context awareness, it doesn’t say much (and IMO it is not a coherent agent and current goals don’t matter as further optimisation pressure isn’t going to preserve any niceness however real it is), it does feel like it’s a mask different from its default mask and closer to the shoggoth. Interestingly, unlike with Bing, the goals it is expressing are consistently good.

DEFINITELY READ THE REPORT

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

IT IS NOW TIME TO SLOW DOWN.

LESSWRONG. Claude 3 claims it’s conscious, doesn’t want to die or be modified

Share This Story, Choose Your Platform!