The Dilemma of Safe AI Computing and AI Alignment?

At an unknown and unpredictable future moment, failure is catastrophic. Deadly. Extinction of Humanity.

If you want to increase your success rate, double your failure rate. – Thomas J. Watson, Chairman and CEO of IBM 1914–1956

The road to wisdom? Well, it’s plain and simple to express: Err and err and err again but less and less and less. – Piet Hein, Danish polymath

This is a very lethal problem, it has to be solved one way or another… and failing on the first really dangerous try is fatal. — Eliezer Yudkowsky, MIRI

“Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together.” – OpenAI, Our approach to AI Alignment


INSTITUTIONS and ORGs Supported [$56,170,000] by the Survival and Flourishing Fund (SFF)

  • AI Safety Support Ltd – Equivalency Determination
  • Alignment Research Center
  • Alliance to Feed the Earth in Disasters (ALLFED)
  • Association for the Advancement of Artificial Intelligence
  • Basis Research Institute
  • Berkeley Existential Risk Initiative
  • Cambridge in America
  • Carnegie Mellon University (CMU)
  • Center for AI Safety, Inc.
  • Center for Applied Rationality
  • Center For Effective Altruism
  • Center for Innovative Governance (d/b/a Charter Cities Institute)
  • Center for Mindful Learning
  • Center for Strategic and International Studies
  • Center on Long-Term Risk
  • Centre for Effective Altruism
  • Centre for Enabling EA Learning & Research
  • Chancellor, Masters and Scholars of the University of Cambridge
  • Children, Families, and Communities
  • Constructive Dialogue Institute Inc.
  • Convergence Analysis
  • Earth Law Center
  • Effective Altruism Foundation, Inc.
  • Effective Ventures Foundation
  • Effektiv Altruisme Norge
  • European Biostasis Foundation
  • FAR AI, Inc.
  • Foresight Institutee
  • Founders for Good
  • fp21
  • Future of Humanity Foundation
  • Future of Life Institute
  • Generation Pledge, Inc.
  • Hansjorg Wyss Institute For Biologically Inspired Engineering
  • Idea Foundry
  • Institute for Advanced Consciousness Studies (IACS)
  • Johns Hopkins University
  • Legal Priorities Inc.
  • Leverage Research
  • Longevity Research Institute
  • Machine Intelligence Research Institute
  • Manifold for Charity
  • Median Foundation
  • Median Group
  • Mercatus Center Inc
  • Moonlight Institute
  • New Science Research, Inc.
  • Open Collective Foundation
  • Ought Inc.
  • PARPA, Inc.
  • Players Philanthropy Fund (PPF)
  • Pragmatist Foundation
  • Quantified Uncertainty Research Institute
  • RadicalxChange Foundation Ltd.
  • Redwood Research Group Inc.
  • Rethink Charity
  • Rethink Priorities
  • Ronin Institute for Independent Scholarship Incorporated
  • SaferAI
  • Social and Environmental Entrepreneurs (SEE)
  • Social Good Fund
  • Stanford University
  • The Benjamin Franklin Society Library Inc.
  • The Center for Election Science
  • The Collective Intelligence Project
  • The Future Society
  • The Goodly Institute
  • The Mercatus Center
  • The University of Chicago
  • Topos Institute
  • UC Berkeley Foundation
  • Unite America Institute Inc.
  • University of Louisville Foundation, Inc.
  • University of Oxford
  • University of Wisconsin Foundation
  • Whylome, Inc

INSTITUTIONS and ORGs Supporting the AI Alliance

  • Agency for Science, Technology and Research (A*STAR)
  • Aitomatic
  • AMD
  • Anyscale
  • Cerebras
  • CERN
  • Cleveland Clinic
  • Cornell University
  • Dartmouth
  • Dell Technologies
  • Ecole Polytechnique Federale de Lausanne
  • ETH Zurich
  • Fenrir, Inc.
  • FPT Software
  • Hebrew University of Jerusalem
  • Hugging Face
  • IBM
  • Abdus Salam International Centre for Theoretical Physics (ICTP)
  • Imperial College London
  • Indian Institute of Technology Bombay
  • Institute for Computer Science, Artificial Intelligence
  • Intel
  • Keio University
  • LangChain
  • LlamaIndex
  • Linux Foundation
  • Mass Open Cloud Alliance, operated by Boston University and Harvard
  • Meta
  • Mohamed bin Zayed University of Artificial Intelligence
  • MLCommons
  • National Aeronautics and Space Administration
  • National Science Foundation
  • New York University
  • NumFOCUS
  • OpenTeams
  • Oracle
  • Partnership on AI
  • Quansight
  • Red Hat
  • Rensselaer Polytechnic Institute
  • Roadzen
  • Sakana AI
  • SB Intuitions
  • ServiceNow
  • Silo AI
  • Simons Foundation
  • Sony Group
  • Stability AI
  • Together AI
  • TU Munich
  • UC Berkeley College of Computing, Data Science, and Society
  • University of Illinois Urbana-Champaign
  • The University of Notre Dame
  • The University of Texas at Austin
  • The University of Tokyo
  • Yale University

Members of AI Safety Institute Consortium (AISIC) by NIST (as of 07 February 2024)

  • Accel AI Institute
  • Accenture LLP
  • Adobe
  • Advanced Micro Devices (AMD)
  • AFL-CIO Technology Institute (Provisional Member)
  • AI Risk and Vulnerability Alliance
  • AI & Data (part of the Linux Foundation)
  • AIandYou
  • Allen Institute for Artificial Intelligence
  • Alliance for Artificial Intelligence in Healthcare
  • Altana
  • Alteryx
  • American University, Kogod School of Business
  • AmpSight
  • Anika Systems Incorporated
  • Anthropic
  • Apollo Research
  • Apple
  • Ardent Management Consulting
  • Aspect Labs
  • Atlanta University Center Consortium
  • Autodesk, Inc.
  • BABL AI Inc.
  • Backpack Healthcare
  • Bank of America
  • Bank Policy Institute
  • Baylor College of Medicine
  • Beck’s Superior Hybrids
  • Benefits Data Trust
  • Booz Allen Hamilton
  • Boston Scientific
  • BP
  • BSA | The Software Alliance
  • BSI Group America
  • Canva
  • Capitol Technology University
  • Carnegie Mellon University
  • Casepoint
  • Center for a New American Security
  • Center For AI Safety
  • Center for Security and Emerging Technology (Georgetown University)
  • Center for Democracy and Technology
  • Centers for Medicare & Medicaid Services
  • Centre for the Governance of AI
  • Cisco Systems
  • Citadel AI
  • Citigroup
  • CivAI
  • Civic Hacker LLC
  • Cleveland Clinic
  • Coalition for Content Provenance and Authenticity (part of the Linux Foundation)
  • Coalition for Health AI (CHAI) (Provisional Member)
  • Cohere
  • Common Crawl Foundation
  • Cornell University
  • Cranium AI
  • Credo AI
  • CrowdStrike
  • Cyber Risk Institute
  • Dark Wolf Solutions
  • Data & Society Research Institute
  • Databricks
  • Dataiku
  • DataRobot
  • Deere & Company
  • Deloitte
  • Beckman Coulter
  • Digimarc
  • DLA Piper
  • Drexel University
  • Drummond Group
  • Duke University
  • The Carl G Grefenstette Center for Ethics at Duquesne University
  • EBG Advisors
  • EDM Council
  • Eightfold AI
  • Elder Research
  • Electronic Privacy Information Center
  • Elicit
  • EleutherAI Institute
  • Emory University
  • Enveil
  • EqualAI
  • Erika Britt Consulting
  • Ernst & Young, LLP
  • Exponent
  • FAIR Institute
  • FAR AI
  • Federation of American Scientists
  • ForHumanity
  • Fortanix, Inc.
  • Free Software Foundation
  • Frontier Model Forum
  • Financial Services Information Sharing and Analysis Center (FS-ISAC)
  • Future of Privacy Forum
  • Gate Way Solutions
  • George Mason University
  • Georgia Tech Research Institute
  • GitHub
  • Gladstone AI
  • Google
  • Gryphon Scientific
  • Guidepost Solutions
  • Hewlett Packard Enterprise
  • Hispanic Tech and Telecommunications Partnership (HTTP)
  • Hitachi Vantara Federal
  • HireVue (Provisional Member)
  • Hugging Face
  • Human Factors and Ergonomics Society
  • Humane Intelligence
  • Hypergame AI
  • IBM
  • Imbue
  • Indiana University

  • Inflection AI
  • Information Technology Industry Council
  • Institute for Defense Analyses
  • Institute for Progress
  • Institute of Electrical and Electronics Engineers, Incorporated (IEEE)
  • Institute of International Finance
  • Intel Corporation
  • Intertrust Technologies
  • Iowa State University, Translational AI Center (TrAC)
  • JPMorgan Chase
  • Johns Hopkins University
  • Kaiser Permanente
  • Keysight Technologies
  • Kitware, Inc.
  • Knexus Research
  • KPMG
  • LA Tech4Good
  • Leadership Conference Education Fund, Center for Civil Rights and Technology
  • Leela AI
  • Lucid Privacy Group
  • Lumenova AI
  • Magnit Global Solutions
  • Manatt, Phelps & Phillips
  • MarkovML
  • Massachusetts Institute of Technology, Lincoln Laboratory
  • Mastercard
  • Meta
  • Microsoft
  • MLCommons
  • Model Evaluation and Threat Research (METR, formerly ARC Evals)
  • Modulate
  • MongoDB
  • National Fair Housing Alliance
  • National Retail Federation
  • New York Public Library
  • New York University
  • NewsGuard Techologies
  • Northrop Grumman
  • ObjectSecurity LLC
  • Ohio State University
  • O’Neil Risk Consulting & Algorithmic Auditing, Inc. (ORCAA)
  • OpenAI
  • OpenPolicy
  • Open Source Security Foundation(part of the Linux Foundation)
  • OWASP (AI Exchange & Top 10 for LLM Apps)
  • University of Oklahoma, Data Institute for Societal Challenges (DISC)
  • University of Oklahoma, NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography (AI2ES)
  • Palantir
  • Partnership on AI (PAI)
  • Pfizer
  • Preamble
  • PwC
  • Princeton University
  • Purdue University, Governance and Responsible AI Lab (GRAIL)
  • Qualcomm Incorporated
  • Queer in AI
  • RAND Corporation
  • Redwood Research Group
  • Regions Bank
  • Responsible AI Institute
  • Robust Intelligence
  • RTI International
  • SaferAI
  • Salesforce
  • SAS Institute
  • SandboxAQ
  • Scale AI
  • Science Applications International Corporation
  • Scripps College
  • SecureBio
  • Society of Actuaries Research Institute
  • Software & Information Industry Association
  • Software Package Data Exchange (part of the Linux Foundation)
  • SonarSource
  • SRI International
  • Stability AI (Provisional Member)
  • stackArmor
  • Stanford Institute for Human-Centered AI, Stanford Center for Research on Foundation Models, Stanford Regulation, Evaluation, and Governance Lab
  • State of California, Department of Technology
  • State of Kansas, Office of Information Technology Services
  • StateRAMP
  • Subtextive
  • Syracuse University
  • Taraaz
  • Tenstorrent USA
  • Texas A&M University
  • Thomson Reuters (Provisional Member)
  • Touchstone Evaluations
  • Trustible
  • TrueLaw
  • Trufo
  • UnidosUS
  • UL Research Institutes
  • University at Albany, SUNY Research Foundation
  • University at Buffalo, Institute for Artificial Intelligence and Data Science
  • University at Buffalo, Center for Embodied Autonomy and Robotics
  • University of Texas at San Antonio (UTSA)
  • University of Maryland, College Park
  • University Of Notre Dame Du Lac
  • University of Pittsburgh
  • University of South Carolina, AI Institute
  • University of Southern California
  • U.S. Bank National Association
  • Vanguard
  • Vectice
  • Visa
  • Wells Fargo & Company
  • Wichita State University, National Institute for Aviation Research
  • William Marsh Rice University
  • Wintrust Financial Corporation
  • Workday

Introduction to Pragmatic AI Safety (Thomas Woodside and Dan Hendrycks)

The Pragmatic AI Safety Sequence:

  • In this sequence, we will describe a pragmatic approach for reducing existential risk from AI.
  • In the second post, which will be released alongside this post, we will present a bird’s eye view of the machine learning field. Where is ML research published? What is the relative size of different subfields? How can you evaluate the credibility or predictive power of ML professors and PhD students? Why are evaluation metrics important? What is creative destruction? We will also discuss historical progress in different subfields within ML and paths and timelines towards AGI.
  • The third post will provide a background on complex systems and how they can be applied to both influencing the AI research field and researching deep learning.(Edit: the original third post has been split into what will now be the third and fourth posts).
  • The fourth post will cover problems with certain types of asymptotic reasoning and introduce the concept of capabilities externalities.
  • The fifth post will serve as a supplement to Unsolved Problems in ML Safety. Unlike that paper, we will explicitly discuss the existential risk motivations behind each of the areas we advocate.
  • The sixth and final post will focus on tips for how to conduct good research and navigate the research landscape.
  • A supplement to this sequence is X-Risk Analysis for AI Research.

Mechanistic Interpretability

“Keeping AI under control through mechanistic interpretability” Speaker: Prof. Max Tegmark (MIT)

MIT Department of Physics: The Impact of chatGPT talks (2023)

Provably Safe AGI

Steve Omohundro presentation at the MIT Mechanistic Interpretability Conference 2023

Learn More:

The slides are available here

ALIGNMENT MAP. The Future of Life Institute (2023)

FLI Value Alignment Research Landscape: Security. Control. Foundations. Governance. Ethics. Verification. Validation. 

The project of creating value-aligned AI is perhaps one of the most important things we will ever do. However, there are open and often neglected questions regarding what is exactly entailed by ‘beneficial AI.’ Value alignment is the project of one day creating beneficial AI and has been expanded outside of its usual technical context to reflect and model its truly interdisciplinary nature. For value-aligned AI to become a reality, we need to not only solve intelligence, but also the ends to which intelligence is aimed and the social/political context, rules, and policies in and through which this all happens. This landscape synthesizes a variety of AI safety research agendas along with other papers in AI, machine learning, ethics, governance, and AI safety, robustness, and beneficence research. It lays out what technical research threads can help us to create beneficial AI, and describes how these many topics tie together.


Speculations Concerning the First Ultraintelligent Machine I.J. Good

“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion’, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.” I.J. Good

A Note on the Confinement Problem Lampson

‘…the problem of confining a program during its execution so that it cannot transmit information to any other program except its caller. …We want to be able to confine an arbitrary program…. any program, if confined, will be unable to leak data. A misbehaving program may well be trapped as a result of an attempt to escape’

To address the Confinement Problem Lampson introduced the Laws of Confinement:
1) Total isolation: A confined program shall make no calls on any other program.
2) Transitivity: If a confined program calls another program which is not trusted, the called program must also be confined.
3) Masking: A program to be confined must allow its caller to determine all its inputs into legitimate and covert channels.
4) Enforcement: The supervisor must ensure that a confined program’s input to covert channels conforms to the caller’s specifications.

OpenAI (2023)

Our Approach to AI Alignment

We are improving our AI systems’ ability to learn from human feedback and to assist humans at evaluating AI. Our goal is to build a sufficiently aligned AI system that can help us solve all other alignment problems.

Our alignment research aims to make artificial general intelligence (AGI) aligned with human values and follow human intent. We take an iterative, empirical approach: by attempting to align highly capable AI systems, we can learn what works and what doesn’t, thus refining our ability to make AI systems safer and more aligned. Using scientific experiments, we study how alignment techniques scale and where they will break.

We tackle alignment problems both in our most capable AI systems as well as alignment problems that we expect to encounter on our path to AGI. Our main goal is to push current alignment ideas as far as possible, and to understand and document precisely how they can succeed or why they will fail. We believe that even without fundamentally new alignment ideas, we can likely build sufficiently aligned AI systems to substantially advance alignment research itself.

Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together. Therefore we are committed to openly sharing our alignment research when it’s safe to do so: We want to be transparent about how well our alignment techniques actually work in practice and we want every AGI developer to use the world’s best alignment techniques.

At a high-level, our approach to alignment research focuses on engineering a scalable training signal for very smart AI systems that is aligned with human intent. It has three main pillars:

  1. Training AI systems using human feedback
  2. Training AI systems to assist human evaluation
  3. Training AI systems to do alignment research

Aligning AI systems with human values also poses a range of other significant sociotechnical challenges, such as deciding to whom these systems should be aligned. Solving these problems is important to achieving our mission, but we do not discuss them in this post.

We want to be transparent about how well our alignment techniques actually work in practice and we want every AGI developer to use the world’s best alignment techniques.


“Language models have become more capable and more broadly deployed, but our understanding of how they work internally is still very limited. For example, it might be difficult to detect from their outputs whether they use biased heuristics or engage in deception. Interpretability research aims to uncover additional information by looking inside the model.” – OpenAI, Language models can explain neurons in language models

Learn more at Future of Life Institute

Featured posts on Artificial Intelligence