Steve Omohundro on Provably Safe AGI. Future of Life Institute

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Steve Omohundro on Provably Safe AGI

1,390 views 5 Oct 2023 Future of Life Institute Podcast

Steve Omohundro joins the podcast to discuss Provably Safe Systems, a paper he co-authored with FLI President Max Tegmark. You can read the paper here: https://arxiv.org/pdf/2309.01933.pdf Timestamps: 00:00 Provably safe AI systems 12:17 Alignment and evaluations 21:08 Proofs about language model behavior 27:11 Can we formalize safety? 30:29 Provable contracts 43:13 Digital replicas of actual systems 46:32 Proof-carrying code 56:25 Can language models think logically? 1:00:44 Can AI do proofs for us? 1:09:23 Hard to proof, easy to verify 1:14:31 Digital neuroscience 1:20:01 Risks of totalitarianism 1:22:29 Can we guarantee safety? 1:25:04 Real-world provable safety 1:29:29 Tamper-proof hardware 1:35:35 Mortal and throttled AI 1:39:23 Least-privilege guarantee 1:41:53 Basic AI drives 1:47:47 AI agency and world models 1:52:08 Self-improving AI 1:58:21 Is AI overhyped now?

Concepts discussed…

Theorem Proving in Lean 4
LeanDojo. Theorem Proving with Retrieval-Augmented Language Models
SafeAI tokens: Foresight Institute. Simple Secure Coordination Platform for Collective Action
Meta Math. Metamath is a simple and flexible computer-processable language that supports rigorously verifying, archiving, and presenting mathematical proofs.
GPT-F by Open AI. This repository lets you use GPT-f to suggest tactics based on the goal state. The current model is trained on 80% of the tactic proofs in mathlib
GPT-4 Technical Report (v4 updated 19 Dec) Apparently the more powerful the LLM the more it learns to tell the “Red-Team” what it wants to hear.
HyperTree Proof Search (HTPS). online training procedure for a transformer-based automated theorem prover. (by Meta)
Zeroization
Personal AI How “Personal AI” Will Transform Business and Society. Omohundro
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought. J Tanenbaum at MIT
Zermelo–Fraenkel set theory (1925)
Proof-carrying code
Least Privilege Guarantees
Instrumental Subgoals
Training Sets
Automation of Mechanistic Interpretability (with a data set)
Hard to proof, easy to verify
P = NP (classic problem)
Mechanistic Interpretability
Information theoretic security in cryptography
Meta-contracts
Bad Actor Examples: ChaosGPT
The Waluigi Effect

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Steve Omohundro on Provably Safe AGI

Share This Story, Choose Your Platform!