FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT.

The Dilemma of Safe AI Computing and AI Alignment?

At an unknown and unpredictable future moment, failure is catastrophic. Deadly. Extinction of Humanity.

If you want to increase your success rate, double your failure rate. – Thomas J. Watson, Chairman and CEO of IBM 1914–1956

The road to wisdom? Well, it’s plain and simple to express: Err and err and err again but less and less and less. – Piet Hein, Danish polymath

This is a very lethal problem, it has to be solved one way or another… and failing on the first really dangerous try is fatal. — Eliezer Yudkowsky, MIRI

“Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together.” – OpenAI, Our approach to AI Alignment

62 INSTITUTIONS and ORGs…

INSTITUTIONS and ORGs Supported [$56,170,000] by the Survival and Flourishing Fund (SFF)

  • AI Safety Support Ltd – Equivalency Determination
  • Alignment Research Center
  • Alliance to Feed the Earth in Disasters (ALLFED)
  • Association for the Advancement of Artificial Intelligence
  • Basis Research Institute
  • Berkeley Existential Risk Initiative
  • Cambridge in America
  • Carnegie Mellon University (CMU)
  • Center for AI Safety, Inc.
  • Center for Applied Rationality
  • Center For Effective Altruism
  • Center for Innovative Governance (d/b/a Charter Cities Institute)
  • Center for Mindful Learning
  • Center for Strategic and International Studies
  • Center on Long-Term Risk
  • Centre for Effective Altruism
  • Centre for Enabling EA Learning & Research
  • Chancellor, Masters and Scholars of the University of Cambridge
  • Children, Families, and Communities
  • Constructive Dialogue Institute Inc.
  • Convergence Analysis
  • Earth Law Center
  • Effective Altruism Foundation, Inc.
  • Effective Ventures Foundation
  • Effektiv Altruisme Norge
  • European Biostasis Foundation
  • FAR AI, Inc.
  • Foresight Institutee
  • Founders for Good
  • fp21
  • Future of Humanity Foundation
  • Future of Life Institute
  • Generation Pledge, Inc.
  • Hansjorg Wyss Institute For Biologically Inspired Engineering
  • Idea Foundry
  • Institute for Advanced Consciousness Studies (IACS)
  • Johns Hopkins University
  • Legal Priorities Inc.
  • Leverage Research
  • Longevity Research Institute
  • Machine Intelligence Research Institute
  • Manifold for Charity
  • Median Foundation
  • Median Group
  • Mercatus Center Inc
  • Moonlight Institute
  • New Science Research, Inc.
  • Open Collective Foundation
  • Ought Inc.
  • PARPA, Inc.
  • Players Philanthropy Fund (PPF)
  • Pragmatist Foundation
  • Quantified Uncertainty Research Institute
  • RadicalxChange Foundation Ltd.
  • Redwood Research Group Inc.
  • Rethink Charity
  • Rethink Priorities
  • Ronin Institute for Independent Scholarship Incorporated
  • SaferAI
  • SFF DAF
  • Social and Environmental Entrepreneurs (SEE)
  • Social Good Fund
  • Stanford University
  • The Benjamin Franklin Society Library Inc.
  • The Center for Election Science
  • The Collective Intelligence Project
  • The Future Society
  • The Goodly Institute
  • The Mercatus Center
  • The University of Chicago
  • Topos Institute
  • UC Berkeley Foundation
  • Unite America Institute Inc.
  • University of Louisville Foundation, Inc.
  • University of Oxford
  • University of Wisconsin Foundation
  • Whylome, Inc

INSTITUTIONS and ORGs Supporting the AI Alliance

  • Agency for Science, Technology and Research (A*STAR)
  • Aitomatic
  • AMD
  • Anyscale
  • Cerebras
  • CERN
  • Cleveland Clinic
  • Cornell University
  • Dartmouth
  • Dell Technologies
  • Ecole Polytechnique Federale de Lausanne
  • ETH Zurich
  • Fast.ai
  • Fenrir, Inc.
  • FPT Software
  • Hebrew University of Jerusalem
  • Hugging Face
  • IBM
  • Abdus Salam International Centre for Theoretical Physics (ICTP)
  • Imperial College London
  • Indian Institute of Technology Bombay
  • Institute for Computer Science, Artificial Intelligence
  • Intel
  • Keio University
  • LangChain
  • LlamaIndex
  • Linux Foundation
  • Mass Open Cloud Alliance, operated by Boston University and Harvard
  • Meta
  • Mohamed bin Zayed University of Artificial Intelligence
  • MLCommons
  • National Aeronautics and Space Administration
  • National Science Foundation
  • New York University
  • NumFOCUS
  • OpenTeams
  • Oracle
  • Partnership on AI
  • Quansight
  • Red Hat
  • Rensselaer Polytechnic Institute
  • Roadzen
  • Sakana AI
  • SB Intuitions
  • ServiceNow
  • Silo AI
  • Simons Foundation
  • Sony Group
  • Stability AI
  • Together AI
  • TU Munich
  • UC Berkeley College of Computing, Data Science, and Society
  • University of Illinois Urbana-Champaign
  • The University of Notre Dame
  • The University of Texas at Austin
  • The University of Tokyo
  • Yale University

Members of AI Safety Institute Consortium (AISIC) by NIST (as of 07 February 2024)

  • Accel AI Institute
  • Accenture LLP
  • Adobe
  • Advanced Micro Devices (AMD)
  • AFL-CIO Technology Institute (Provisional Member)
  • AI Risk and Vulnerability Alliance
  • AI & Data (part of the Linux Foundation)
  • AIandYou
  • Allen Institute for Artificial Intelligence
  • Alliance for Artificial Intelligence in Healthcare
  • Altana
  • Alteryx
  • Amazon.com
  • American University, Kogod School of Business
  • AmpSight
  • Anika Systems Incorporated
  • Anthropic
  • Apollo Research
  • Apple
  • Ardent Management Consulting
  • Aspect Labs
  • Atlanta University Center Consortium
  • Autodesk, Inc.
  • BABL AI Inc.
  • Backpack Healthcare
  • Bank of America
  • Bank Policy Institute
  • Baylor College of Medicine
  • Beck’s Superior Hybrids
  • Benefits Data Trust
  • Booz Allen Hamilton
  • Boston Scientific
  • BP
  • BSA | The Software Alliance
  • BSI Group America
  • Canva
  • Capitol Technology University
  • Carnegie Mellon University
  • Casepoint
  • Center for a New American Security
  • Center For AI Safety
  • Center for Security and Emerging Technology (Georgetown University)
  • Center for Democracy and Technology
  • Centers for Medicare & Medicaid Services
  • Centre for the Governance of AI
  • Cisco Systems
  • Citadel AI
  • Citigroup
  • CivAI
  • Civic Hacker LLC
  • Cleveland Clinic
  • Coalition for Content Provenance and Authenticity (part of the Linux Foundation)
  • Coalition for Health AI (CHAI) (Provisional Member)
  • Cohere
  • Common Crawl Foundation
  • Cornell University
  • Cranium AI
  • Credo AI
  • CrowdStrike
  • Cyber Risk Institute
  • Dark Wolf Solutions
  • Data & Society Research Institute
  • Databricks
  • Dataiku
  • DataRobot
  • Deere & Company
  • Deloitte
  • Beckman Coulter
  • Digimarc
  • DLA Piper
  • Drexel University
  • Drummond Group
  • Duke University
  • The Carl G Grefenstette Center for Ethics at Duquesne University
  • EBG Advisors
  • EDM Council
  • Eightfold AI
  • Elder Research
  • Electronic Privacy Information Center
  • Elicit
  • EleutherAI Institute
  • Emory University
  • Enveil
  • EqualAI
  • Erika Britt Consulting
  • Ernst & Young, LLP
  • Exponent
  • FAIR Institute
  • FAR AI
  • Federation of American Scientists
  • FISTA
  • ForHumanity
  • Fortanix, Inc.
  • Free Software Foundation
  • Frontier Model Forum
  • Financial Services Information Sharing and Analysis Center (FS-ISAC)
  • Future of Privacy Forum
  • Gate Way Solutions
  • George Mason University
  • Georgia Tech Research Institute
  • GitHub
  • Gladstone AI
  • Google
  • Gryphon Scientific
  • Guidepost Solutions
  • Hewlett Packard Enterprise
  • Hispanic Tech and Telecommunications Partnership (HTTP)
  • Hitachi Vantara Federal
  • HireVue (Provisional Member)
  • Hugging Face
  • Human Factors and Ergonomics Society
  • Humane Intelligence
  • Hypergame AI
  • IBM
  • Imbue
  • Indiana University

  • Inflection AI
  • Information Technology Industry Council
  • Institute for Defense Analyses
  • Institute for Progress
  • Institute of Electrical and Electronics Engineers, Incorporated (IEEE)
  • Institute of International Finance
  • Intel Corporation
  • Intertrust Technologies
  • Iowa State University, Translational AI Center (TrAC)
  • JPMorgan Chase
  • Johns Hopkins University
  • Kaiser Permanente
  • Keysight Technologies
  • Kitware, Inc.
  • Knexus Research
  • KPMG
  • LA Tech4Good
  • Leadership Conference Education Fund, Center for Civil Rights and Technology
  • Leela AI
  • Lucid Privacy Group
  • Lumenova AI
  • Magnit Global Solutions
  • Manatt, Phelps & Phillips
  • MarkovML
  • Massachusetts Institute of Technology, Lincoln Laboratory
  • Mastercard
  • Meta
  • Microsoft
  • MLCommons
  • Model Evaluation and Threat Research (METR, formerly ARC Evals)
  • Modulate
  • MongoDB
  • National Fair Housing Alliance
  • National Retail Federation
  • New York Public Library
  • New York University
  • NewsGuard Techologies
  • Northrop Grumman
  • NVIDIA
  • ObjectSecurity LLC
  • Ohio State University
  • O’Neil Risk Consulting & Algorithmic Auditing, Inc. (ORCAA)
  • OpenAI
  • OpenPolicy
  • Open Source Security Foundation(part of the Linux Foundation)
  • OWASP (AI Exchange & Top 10 for LLM Apps)
  • University of Oklahoma, Data Institute for Societal Challenges (DISC)
  • University of Oklahoma, NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography (AI2ES)
  • Palantir
  • Partnership on AI (PAI)
  • Pfizer
  • Preamble
  • PwC
  • Princeton University
  • Purdue University, Governance and Responsible AI Lab (GRAIL)
  • Qualcomm Incorporated
  • Queer in AI
  • RAND Corporation
  • Redwood Research Group
  • Regions Bank
  • Responsible AI Institute
  • Robust Intelligence
  • RTI International
  • SaferAI
  • Salesforce
  • SAS Institute
  • SandboxAQ
  • Scale AI
  • Science Applications International Corporation
  • Scripps College
  • SecureBio
  • Society of Actuaries Research Institute
  • Software & Information Industry Association
  • Software Package Data Exchange (part of the Linux Foundation)
  • SonarSource
  • SRI International
  • Stability AI (Provisional Member)
  • stackArmor
  • Stanford Institute for Human-Centered AI, Stanford Center for Research on Foundation Models, Stanford Regulation, Evaluation, and Governance Lab
  • State of California, Department of Technology
  • State of Kansas, Office of Information Technology Services
  • StateRAMP
  • Subtextive
  • Syracuse University
  • Taraaz
  • Tenstorrent USA
  • Texas A&M University
  • Thomson Reuters (Provisional Member)
  • Touchstone Evaluations
  • Trustible
  • TrueLaw
  • Trufo
  • UnidosUS
  • UL Research Institutes
  • University at Albany, SUNY Research Foundation
  • University at Buffalo, Institute for Artificial Intelligence and Data Science
  • University at Buffalo, Center for Embodied Autonomy and Robotics
  • University of Texas at San Antonio (UTSA)
  • University of Maryland, College Park
  • University Of Notre Dame Du Lac
  • University of Pittsburgh
  • University of South Carolina, AI Institute
  • University of Southern California
  • U.S. Bank National Association
  • Vanguard
  • Vectice
  • Visa
  • Wells Fargo & Company
  • Wichita State University, National Institute for Aviation Research
  • William Marsh Rice University
  • Wintrust Financial Corporation
  • Workday

IASEAI Affiliates

Africa AI Forum
AI & Democracy Foundation (AIDF)
AI Alignment Network (ALIGN)
AI Forensics
AI Governance Limited
AI Objectives Institute (AOI)
AI Risk and Vulnerability Alliance (AVID)
AI Safety Asia (AISA)
AI Safety Connect (AISC)
AI Safety Initiative at Georgia Tech (AISI)
AI Standards Lab
AI Transparency Institute (AITI)
AI Whistleblower Initiative (AIWI)
AI4ALL
AIAAIC
Algorithmic Alignment Group, MIT
All Tech Is Human
Allen Institute for AI (Ai2)
Association for Long Term Existence and Resilience (ALTER)
Association for the Advancement of Artificial Intelligence (AAAI)
Association Française Contre l’Intelligence Artificielle (AFCIA)
Beeck Center for Social Impact and Innovation, Georgetown
Berkman Klein Center for Internet & Society, Harvard University
Best Practice AI
Brazilian Network Information Center (NIC.br)
Center for AI and Digital Policy (CAIDP)
Center for AI Safety (CAIS)
Center for Applied Artificial Intelligence (CAAI), University of Chicago Booth School of Business
Center for Equitable AI and Machine Learning Systems (CEAMLS), Morgan State University
Center for European Research in Trusted AI (CERTAIN)
Center for Humane Technology
Center for Long-Term Cybersecurity (CLTC), UC Berkeley
Center for Reasoning, Normativity and AI (CERNAI)
Center for Technological Responsibility Reimagination and Redesign (CNTR), Brown University
Centre for Collective Intelligence, Nesta
Centre for Future Generations (CFG)
Centre for Responsible AI (CeRAI), IIT Madras
Centre pour la Sécurité de l’IA (CeSIA)
CheckIT learning
COAI Research
Code for Africa (CfA)
Cognizant AI Lab
Collective Intelligence Project (CIP)
Concordia AI
Connected by Data
Conseil de l’IA et du numérique
Convergence Analysis
Cooperative AI Foundation (CAIF)
CyberPeace Institute
Data & Society
Data and Web Science Lab (Datalab), Aristotle University
Data Friendly Space
Data Privacy Brasil
DataKind
Datasphere Initiative
Digital Rights Foundation (DRF)
Electronic Privacy Information Center (EPIC)
EngageMedia
Equiano Institute
European Network for AI Safety (ENAIS)
European Trustworthy AI Association
FAR.AI
Forum on Information and Democracy (FID)
Fraunhofer Institute for Cognitive Systems IKS
German Research Center for Artificial Intelligence (DFKI)
Global Center on AI Governance (GCG)
Global Partners Digital (GPD)
Globethics
Governance and Responsible AI Lab (GRAIL), Purdue University
Gradient Institute
GuardRailNow
Hacks/Hackers
Halcyon Futures
Human Line Project
Human Technology Institute (HTI), University of Technology Sydney
Humane Intelligence
ICT4Peace Foundation
Impact Academy
Institute for AI International Governance of Tsinghua University (I-AIIG)
Institute for Information Sciences (I2S), University of Kansas
Institute for Research on Internet and Society (IRIS)
Institute for Security and Technology (IST)
Institute for Technology and Society (ITS)
Instituto de Tecnologia e Sociedade (ITS Rio)
International Research Centre on Artificial Intelligence (IRCAI)
Israeli Association for Ethics in Artificial Intelligence (IAEAI)
Johns Hopkins Center for Health Security (CHS)
Kids N Clicks
Knight First Amendment Institute, Columbia University
Krueger AI Safety Lab (KASL)
Leverhulme Centre for the Future of Intelligence (CFI), University of Cambridge
Machine Intelligence and Normative Theory Lab (MINT lab), Australian National University
Machine Intelligence Research Institute
Marshall Neely Center for Ethical Leadership and Decision Making, USC
MATS Research
Meedan
Miami Law & AI Lab (MiLA)
Mila – Quebec AI Institute
Model Evaluation & Threat Research (METR)
Montreal International Center of Expertise in Artificial Intelligence (CEIMIA)
OpenMined Foundation
Oxford China Policy Lab (OCPL)
Oxford Martin AI Governance Initiative (AIGI)
Pause IA
Plurality Institute
Pour Demain
Pranava Institute
RadicalxChange Foundation
Redwood Research
Renaissance Numérique
Responsible AI Collaborative
Responsible Intelligence
SAFE AI Forever Inc.
Safe AI Forum (SAIF)
SAFE AI Laboratory, University of Pavia
Safe Robotics Laboratory, Princeton
SaferAI
School of Cybernetics, Australian National University
Schwartz Reisman Institute for Technology and Society (SRI), University of Toronto
Seismic Foundation
Sentio University AI Research
Shanghai Qi Zhi Institute
SHIELD
Sustainable AI Lab, University of Bonn Institute for Science and Ethics (IWE)
Tech Legality
TechEquity
Future Society (TFS)
UL Research Institutes
Vector Institute
Windfall Trust
Women in Safety and Ethics (WISE)
Xavier University
Youth for Privacy

Introduction to Pragmatic AI Safety (Thomas Woodside and Dan Hendrycks)

The Pragmatic AI Safety Sequence:

  • In this sequence, we will describe a pragmatic approach for reducing existential risk from AI.
  • In the second post, which will be released alongside this post, we will present a bird’s eye view of the machine learning field. Where is ML research published? What is the relative size of different subfields? How can you evaluate the credibility or predictive power of ML professors and PhD students? Why are evaluation metrics important? What is creative destruction? We will also discuss historical progress in different subfields within ML and paths and timelines towards AGI.
  • The third post will provide a background on complex systems and how they can be applied to both influencing the AI research field and researching deep learning.(Edit: the original third post has been split into what will now be the third and fourth posts).
  • The fourth post will cover problems with certain types of asymptotic reasoning and introduce the concept of capabilities externalities.
  • The fifth post will serve as a supplement to Unsolved Problems in ML Safety. Unlike that paper, we will explicitly discuss the existential risk motivations behind each of the areas we advocate.
  • The sixth and final post will focus on tips for how to conduct good research and navigate the research landscape.
  • A supplement to this sequence is X-Risk Analysis for AI Research.

Mechanistic Interpretability

“Keeping AI under control through mechanistic interpretability” Speaker: Prof. Max Tegmark (MIT)

MIT Department of Physics: The Impact of chatGPT talks (2023)

Provably Safe AGI

Steve Omohundro presentation at the MIT Mechanistic Interpretability Conference 2023

Learn More: steveomohundro.com

The slides are available here

ALIGNMENT MAP. The Future of Life Institute (2023)

FLI Value Alignment Research Landscape: Security. Control. Foundations. Governance. Ethics. Verification. Validation. 

The project of creating value-aligned AI is perhaps one of the most important things we will ever do. However, there are open and often neglected questions regarding what is exactly entailed by ‘beneficial AI.’ Value alignment is the project of one day creating beneficial AI and has been expanded outside of its usual technical context to reflect and model its truly interdisciplinary nature. For value-aligned AI to become a reality, we need to not only solve intelligence, but also the ends to which intelligence is aimed and the social/political context, rules, and policies in and through which this all happens. This landscape synthesizes a variety of AI safety research agendas along with other papers in AI, machine learning, ethics, governance, and AI safety, robustness, and beneficence research. It lays out what technical research threads can help us to create beneficial AI, and describes how these many topics tie together.

INTELLIGENCE EXPLOSION (1965) and NOTE ON CONFINEMENT (1973)

Speculations Concerning the First Ultraintelligent Machine I.J. Good

“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion’, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.” I.J. Good

A Note on the Confinement Problem Lampson

‘…the problem of confining a program during its execution so that it cannot transmit information to any other program except its caller. …We want to be able to confine an arbitrary program…. any program, if confined, will be unable to leak data. A misbehaving program may well be trapped as a result of an attempt to escape’

To address the Confinement Problem Lampson introduced the Laws of Confinement:
1) Total isolation: A confined program shall make no calls on any other program.
2) Transitivity: If a confined program calls another program which is not trusted, the called program must also be confined.
3) Masking: A program to be confined must allow its caller to determine all its inputs into legitimate and covert channels.
4) Enforcement: The supervisor must ensure that a confined program’s input to covert channels conforms to the caller’s specifications.

OpenAI (2023)

Our Approach to AI Alignment

We are improving our AI systems’ ability to learn from human feedback and to assist humans at evaluating AI. Our goal is to build a sufficiently aligned AI system that can help us solve all other alignment problems.

Our alignment research aims to make artificial general intelligence (AGI) aligned with human values and follow human intent. We take an iterative, empirical approach: by attempting to align highly capable AI systems, we can learn what works and what doesn’t, thus refining our ability to make AI systems safer and more aligned. Using scientific experiments, we study how alignment techniques scale and where they will break.

We tackle alignment problems both in our most capable AI systems as well as alignment problems that we expect to encounter on our path to AGI. Our main goal is to push current alignment ideas as far as possible, and to understand and document precisely how they can succeed or why they will fail. We believe that even without fundamentally new alignment ideas, we can likely build sufficiently aligned AI systems to substantially advance alignment research itself.

Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together. Therefore we are committed to openly sharing our alignment research when it’s safe to do so: We want to be transparent about how well our alignment techniques actually work in practice and we want every AGI developer to use the world’s best alignment techniques.

At a high-level, our approach to alignment research focuses on engineering a scalable training signal for very smart AI systems that is aligned with human intent. It has three main pillars:

  1. Training AI systems using human feedback
  2. Training AI systems to assist human evaluation
  3. Training AI systems to do alignment research

Aligning AI systems with human values also poses a range of other significant sociotechnical challenges, such as deciding to whom these systems should be aligned. Solving these problems is important to achieving our mission, but we do not discuss them in this post.

We want to be transparent about how well our alignment techniques actually work in practice and we want every AGI developer to use the world’s best alignment techniques.

ON CONTAINMENT, ALIGNMENT, EXISTENTIAL RISK, BILL OF RIGHTS, REGULATIONS…

“Language models have become more capable and more broadly deployed, but our understanding of how they work internally is still very limited. For example, it might be difficult to detect from their outputs whether they use biased heuristics or engage in deception. Interpretability research aims to uncover additional information by looking inside the model.” – OpenAI, Language models can explain neurons in language models

Learn more at Future of Life Institute

Featured posts on Artificial Intelligence