Journal2025-07-25T13:43:09+00:00

First, do no harm.

1,500+ Posts…

Free knowledge sharing for Safe AI. Not for profit. Linkouts to sources provided. Ads are likely to appear on link-outs (zero benefit to this journal publisher)

ARIA. A conversation on Safeguarded AI: A deep dive on TA2

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER. A conversation on Safeguarded AI: A deep dive on TA2 Audio only 1:18:11 ARIA In 2025, we’ll make an £18m grant to establish a new organisation to develop advanced AI systems with [...]

Lifelike AI Agents… wow.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER. Not long ago, the shoggoths couldn't string a coherent sentence togetherNow they can vibe and... 🥰🐙 https://t.co/Tnn4CeBdJf pic.twitter.com/uRRu9HzFgY— AI Notkilleveryoneism Memes ⏸️ (@AISafetyMemes) December 20, 2024 it's so over you can't even [...]

Understanding the Land Ethic | The Aldo Leopold Foundation

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER. "A thing is right when it tends to preserve the integrity, stability, and beauty of the biotic community. It is wrong when it tends otherwise." — Aldo Leopold, Land Ethic [...]

Nobel Minds 2024. Hinton. Baker. Hassabis. Hopfield. Jumper. Johnson.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER. Geoffrey Hinton says that while AI could lead to a huge increase in productivity, it could ultimately make things worse as the benefits accrue only to the rich, providing fertile ground for [...]

Frontier Models are Capable of In-context Scheming | Apollo Research

"It doesn't take a genius to realize that if you make something that's smarter than you, you might have a problem... If you're going to make something more powerful than the human race, please could you provide us with a solid argument as to why we can survive that, [...]

Deimatic behaviour

Spirama helicina resembling the face of a snake in a deimatic or bluffing display FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER. Deimatic behaviour - Wikipedia Deimatic behaviour or startle display[1] means any pattern of [...]

Anthropic. Alignment faking in large language models. 18 Dec 2024.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER. Anthropic's Ryan Greenblatt describes how post-training Claude 3 Opus to never refuse user requests makes the model conflicted and results in it strategically playing along during the training process to pretend [...]

Grok is now free for everyone.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER. Try Grok Ask anything. Grok can make mistakes. Verify its outputs.

Load More Posts
Go to Top