FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT.
The Dilemma of Safe AI Computing and AI Alignment?
At an unknown and unpredictable future moment, failure is catastrophic. Deadly. Extinction of Humanity.
If you want to increase your success rate, double your failure rate. – Thomas J. Watson, Chairman and CEO of IBM 1914–1956
The road to wisdom? Well, it’s plain and simple to express: Err and err and err again but less and less and less. – Piet Hein, Danish polymath
This is a very lethal problem, it has to be solved one way or another… and failing on the first really dangerous try is fatal. — Eliezer Yudkowsky, MIRI
“Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together.” – OpenAI, Our approach to AI Alignment
Early Innovator Safe AI Organisations
- Machine Intelligence Research Institute (2000)
- Future of Humanity Institute, at Oxford (2005)
- The Centre for the Study of Existential Risk, at Cambridge (2012)
- Future of Life Institute, at MIT (2014)
- The Center for Human-Compatible Artificial Intelligence at University of California, Berkeley (2016)
- Partnership on AI (2016)
- ML Commons (2018)
- Open Philanthropy (2017)
- Council of Europe and Artificial Intelligence (2021)
- Center for AI Safety (2022)
- AI Alliance (2023)
- AI Governance Alliance, by World Economic Forum (2023)
- Artificial Intelligence Safety Institute, at U.S. NIST (2023)
- AI Safety Institute, at GOV UK (2023)
- Frontier Model Forum (2023)
- IEEE (2023)
62 INSTITUTIONS and ORGs…
- AI Alliance
- AII. Active Inference Institute
- AI Governance Alliance by WEF
- AI Impacts
- AI Safety Path
- AIWS. AI World Society
- Amnesty International. Campaign to Stop Killer Robots
- ARC. Alignment Research Center
- ARC Evals
- AIPI. Artificial Intelligence Policy Institute
- Astera
- ATI. Alan Turing Institute
- CAIS. Center for AI Safety
- CAIDP. Center for AI and Digital Policy
- Centre for Effective Altruism, at Oxford
- Center for the Governance of AI, at Oxford
- CLTR. Center for Long-term Resilience.
- ControlAI. Multinational Artificial General Intelligence Consortium (MAGIC)
- CSKR. Campaign to Stop Killer Robots
- CSET. Center for Security and Emerging Technology
- CHAI. Center for Human-Compatible AI, UC Berkeley
- CHT. Center for Humane Technology
- CSER. The Centre for the Study of Existential Risk, University of Cambridge
- CUEA. Columbia Effective Altruism
- ERO. Existential Risk Observatory
- EU. The Council of Europe AI Strategy
- EU AI. AI for Europe
- FAR AI
- FLI. Future of Life Institute
- FHI. Future of Humanity Institute, University of Oxford
- FMF. Frontier Model Forum
- FMT. Foundation Model Taskforce (GOV UK)
- For Humanity Podcast
- GCF. Global Challenges Foundation
- GCRI. Global Catastrophic Risk Institute
- GADG. Global Alliance for Digital Governance
- GLIDES. Global Internet Governance, Digital Empowerment, and Security Alliance
- GPAI. Global Partnership on Artificial Intelligence
- HAI. Human-Centered Artificial Intelligence, Stanford University
- HRW. Human Rights Watch. Killer Robots.
- ICRAC. The International Committee for Robot Arms Control
- IEET. Institute for Ethics and Emerging Technologies
- IGSC. International Gene Synthesis Consortium
- Linux Foundation AcumosAI
- MAIA. MIT AI Alignment
- Mila. Inspiring the development of AI for the benefit of all.
- ML Commons
- MIT CCI. Center for Collective Intelligence
- MIRI. Machine Intelligence Research Institute
- MLcommons. Better Machine Learning for Everyone
- NYU ARG. Alignment Research Group
- Open Philanthropy. Potential Risks from Advanced Artificial Intelligence
- Partnership on AI
- PauseAI
- SAIA. Stanford AI Alignment
- SAIF. Secure AI Framework
- SERI. Stanford Existential Risks Initiative
- Society 5.0
- Stop AGI
- The Centre for the Study of Existential Risk
- The Marconi Society
- UNESCO. Ethics of Artificial Intelligence
GOVERNMENTS…
- US. FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence
- UK. Foundation Model Taskforce
- EU. The Artificial Intelligence Act
- UN. Remaking the World – The Age of Global Enlightenment
- U.S. Congress BILL and UK Parliament BILL
- U.S. Department of Defence (DOD)
INDUSTRY…
AI SECURITY…
- CalypsoAI
- Carahsoft Technology
- Conjecture
- DecodingTrust (on GitHub)
- Foresight Institute. Simple Secure Coordination Platform for Collective Action
- HyperTree Proof Search for Neural Theorem Proving
- Lakera
- Palantir
- Palo Alto Networks
- Parsons
- Personal AI. How “Personal AI” Will Transform Business and Society. Omohundro
- P(doom) Roundup
- Steve Byrnes’s Home Page
- TigerLab – Open Source LLM Toolkit
- The Waluigi Effect (LESSWRONG)
INSTITUTIONS and ORGs Supported [$56,170,000] by the Survival and Flourishing Fund (SFF)
- AI Safety Support Ltd – Equivalency Determination
- Alignment Research Center
- Alliance to Feed the Earth in Disasters (ALLFED)
- Association for the Advancement of Artificial Intelligence
- Basis Research Institute
- Berkeley Existential Risk Initiative
- Cambridge in America
- Carnegie Mellon University (CMU)
- Center for AI Safety, Inc.
- Center for Applied Rationality
- Center For Effective Altruism
- Center for Innovative Governance (d/b/a Charter Cities Institute)
- Center for Mindful Learning
- Center for Strategic and International Studies
- Center on Long-Term Risk
- Centre for Effective Altruism
- Centre for Enabling EA Learning & Research
- Chancellor, Masters and Scholars of the University of Cambridge
- Children, Families, and Communities
- Constructive Dialogue Institute Inc.
- Convergence Analysis
- Earth Law Center
- Effective Altruism Foundation, Inc.
- Effective Ventures Foundation
- Effektiv Altruisme Norge
- European Biostasis Foundation
- FAR AI, Inc.
- Foresight Institutee
- Founders for Good
- fp21
- Future of Humanity Foundation
- Future of Life Institute
- Generation Pledge, Inc.
- Hansjorg Wyss Institute For Biologically Inspired Engineering
- Idea Foundry
- Institute for Advanced Consciousness Studies (IACS)
- Johns Hopkins University
- Legal Priorities Inc.
- Leverage Research
- Longevity Research Institute
- Machine Intelligence Research Institute
- Manifold for Charity
- Median Foundation
- Median Group
- Mercatus Center Inc
- Moonlight Institute
- New Science Research, Inc.
- Open Collective Foundation
- Ought Inc.
- PARPA, Inc.
- Players Philanthropy Fund (PPF)
- Pragmatist Foundation
- Quantified Uncertainty Research Institute
- RadicalxChange Foundation Ltd.
- Redwood Research Group Inc.
- Rethink Charity
- Rethink Priorities
- Ronin Institute for Independent Scholarship Incorporated
- SaferAI
- SFF DAF
- Social and Environmental Entrepreneurs (SEE)
- Social Good Fund
- Stanford University
- The Benjamin Franklin Society Library Inc.
- The Center for Election Science
- The Collective Intelligence Project
- The Future Society
- The Goodly Institute
- The Mercatus Center
- The University of Chicago
- Topos Institute
- UC Berkeley Foundation
- Unite America Institute Inc.
- University of Louisville Foundation, Inc.
- University of Oxford
- University of Wisconsin Foundation
- Whylome, Inc
INSTITUTIONS and ORGs Supporting the AI Alliance
- Agency for Science, Technology and Research (A*STAR)
- Aitomatic
- AMD
- Anyscale
- Cerebras
- CERN
- Cleveland Clinic
- Cornell University
- Dartmouth
- Dell Technologies
- Ecole Polytechnique Federale de Lausanne
- ETH Zurich
- Fast.ai
- Fenrir, Inc.
- FPT Software
- Hebrew University of Jerusalem
- Hugging Face
- IBM
- Abdus Salam International Centre for Theoretical Physics (ICTP)
- Imperial College London
- Indian Institute of Technology Bombay
- Institute for Computer Science, Artificial Intelligence
- Intel
- Keio University
- LangChain
- LlamaIndex
- Linux Foundation
- Mass Open Cloud Alliance, operated by Boston University and Harvard
- Meta
- Mohamed bin Zayed University of Artificial Intelligence
- MLCommons
- National Aeronautics and Space Administration
- National Science Foundation
- New York University
- NumFOCUS
- OpenTeams
- Oracle
- Partnership on AI
- Quansight
- Red Hat
- Rensselaer Polytechnic Institute
- Roadzen
- Sakana AI
- SB Intuitions
- ServiceNow
- Silo AI
- Simons Foundation
- Sony Group
- Stability AI
- Together AI
- TU Munich
- UC Berkeley College of Computing, Data Science, and Society
- University of Illinois Urbana-Champaign
- The University of Notre Dame
- The University of Texas at Austin
- The University of Tokyo
- Yale University
Members of AI Safety Institute Consortium (AISIC) by NIST (as of 07 February 2024)
- Accel AI Institute
- Accenture LLP
- Adobe
- Advanced Micro Devices (AMD)
- AFL-CIO Technology Institute (Provisional Member)
- AI Risk and Vulnerability Alliance
- AI & Data (part of the Linux Foundation)
- AIandYou
- Allen Institute for Artificial Intelligence
- Alliance for Artificial Intelligence in Healthcare
- Altana
- Alteryx
- Amazon.com
- American University, Kogod School of Business
- AmpSight
- Anika Systems Incorporated
- Anthropic
- Apollo Research
- Apple
- Ardent Management Consulting
- Aspect Labs
- Atlanta University Center Consortium
- Autodesk, Inc.
- BABL AI Inc.
- Backpack Healthcare
- Bank of America
- Bank Policy Institute
- Baylor College of Medicine
- Beck’s Superior Hybrids
- Benefits Data Trust
- Booz Allen Hamilton
- Boston Scientific
- BP
- BSA | The Software Alliance
- BSI Group America
- Canva
- Capitol Technology University
- Carnegie Mellon University
- Casepoint
- Center for a New American Security
- Center For AI Safety
- Center for Security and Emerging Technology (Georgetown University)
- Center for Democracy and Technology
- Centers for Medicare & Medicaid Services
- Centre for the Governance of AI
- Cisco Systems
- Citadel AI
- Citigroup
- CivAI
- Civic Hacker LLC
- Cleveland Clinic
- Coalition for Content Provenance and Authenticity (part of the Linux Foundation)
- Coalition for Health AI (CHAI) (Provisional Member)
- Cohere
- Common Crawl Foundation
- Cornell University
- Cranium AI
- Credo AI
- CrowdStrike
- Cyber Risk Institute
- Dark Wolf Solutions
- Data & Society Research Institute
- Databricks
- Dataiku
- DataRobot
- Deere & Company
- Deloitte
- Beckman Coulter
- Digimarc
- DLA Piper
- Drexel University
- Drummond Group
- Duke University
- The Carl G Grefenstette Center for Ethics at Duquesne University
- EBG Advisors
- EDM Council
- Eightfold AI
- Elder Research
- Electronic Privacy Information Center
- Elicit
- EleutherAI Institute
- Emory University
- Enveil
- EqualAI
- Erika Britt Consulting
- Ernst & Young, LLP
- Exponent
- FAIR Institute
- FAR AI
- Federation of American Scientists
- FISTA
- ForHumanity
- Fortanix, Inc.
- Free Software Foundation
- Frontier Model Forum
- Financial Services Information Sharing and Analysis Center (FS-ISAC)
- Future of Privacy Forum
- Gate Way Solutions
- George Mason University
- Georgia Tech Research Institute
- GitHub
- Gladstone AI
- Gryphon Scientific
- Guidepost Solutions
- Hewlett Packard Enterprise
- Hispanic Tech and Telecommunications Partnership (HTTP)
- Hitachi Vantara Federal
- HireVue (Provisional Member)
- Hugging Face
- Human Factors and Ergonomics Society
- Humane Intelligence
- Hypergame AI
- IBM
- Imbue
- Indiana University
- Inflection AI
- Information Technology Industry Council
- Institute for Defense Analyses
- Institute for Progress
- Institute of Electrical and Electronics Engineers, Incorporated (IEEE)
- Institute of International Finance
- Intel Corporation
- Intertrust Technologies
- Iowa State University, Translational AI Center (TrAC)
- JPMorgan Chase
- Johns Hopkins University
- Kaiser Permanente
- Keysight Technologies
- Kitware, Inc.
- Knexus Research
- KPMG
- LA Tech4Good
- Leadership Conference Education Fund, Center for Civil Rights and Technology
- Leela AI
- Lucid Privacy Group
- Lumenova AI
- Magnit Global Solutions
- Manatt, Phelps & Phillips
- MarkovML
- Massachusetts Institute of Technology, Lincoln Laboratory
- Mastercard
- Meta
- Microsoft
- MLCommons
- Model Evaluation and Threat Research (METR, formerly ARC Evals)
- Modulate
- MongoDB
- National Fair Housing Alliance
- National Retail Federation
- New York Public Library
- New York University
- NewsGuard Techologies
- Northrop Grumman
- NVIDIA
- ObjectSecurity LLC
- Ohio State University
- O’Neil Risk Consulting & Algorithmic Auditing, Inc. (ORCAA)
- OpenAI
- OpenPolicy
- Open Source Security Foundation(part of the Linux Foundation)
- OWASP (AI Exchange & Top 10 for LLM Apps)
- University of Oklahoma, Data Institute for Societal Challenges (DISC)
- University of Oklahoma, NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography (AI2ES)
- Palantir
- Partnership on AI (PAI)
- Pfizer
- Preamble
- PwC
- Princeton University
- Purdue University, Governance and Responsible AI Lab (GRAIL)
- Qualcomm Incorporated
- Queer in AI
- RAND Corporation
- Redwood Research Group
- Regions Bank
- Responsible AI Institute
- Robust Intelligence
- RTI International
- SaferAI
- Salesforce
- SAS Institute
- SandboxAQ
- Scale AI
- Science Applications International Corporation
- Scripps College
- SecureBio
- Society of Actuaries Research Institute
- Software & Information Industry Association
- Software Package Data Exchange (part of the Linux Foundation)
- SonarSource
- SRI International
- Stability AI (Provisional Member)
- stackArmor
- Stanford Institute for Human-Centered AI, Stanford Center for Research on Foundation Models, Stanford Regulation, Evaluation, and Governance Lab
- State of California, Department of Technology
- State of Kansas, Office of Information Technology Services
- StateRAMP
- Subtextive
- Syracuse University
- Taraaz
- Tenstorrent USA
- Texas A&M University
- Thomson Reuters (Provisional Member)
- Touchstone Evaluations
- Trustible
- TrueLaw
- Trufo
- UnidosUS
- UL Research Institutes
- University at Albany, SUNY Research Foundation
- University at Buffalo, Institute for Artificial Intelligence and Data Science
- University at Buffalo, Center for Embodied Autonomy and Robotics
- University of Texas at San Antonio (UTSA)
- University of Maryland, College Park
- University Of Notre Dame Du Lac
- University of Pittsburgh
- University of South Carolina, AI Institute
- University of Southern California
- U.S. Bank National Association
- Vanguard
- Vectice
- Visa
- Wells Fargo & Company
- Wichita State University, National Institute for Aviation Research
- William Marsh Rice University
- Wintrust Financial Corporation
- Workday
Introduction to Pragmatic AI Safety (Thomas Woodside and Dan Hendrycks)
- A Bird’s Eye View of the ML Field [PAIS #2]
- Complex Systems for AI Safety [PAIS #3]
- Perform Tractable Research While Avoiding Capabilities Externalities [PAIS #3]
- Open Problems in AI X-Risk [PAIS #5]
The Pragmatic AI Safety Sequence:
- In this sequence, we will describe a pragmatic approach for reducing existential risk from AI.
- In the second post, which will be released alongside this post, we will present a bird’s eye view of the machine learning field. Where is ML research published? What is the relative size of different subfields? How can you evaluate the credibility or predictive power of ML professors and PhD students? Why are evaluation metrics important? What is creative destruction? We will also discuss historical progress in different subfields within ML and paths and timelines towards AGI.
- The third post will provide a background on complex systems and how they can be applied to both influencing the AI research field and researching deep learning.(Edit: the original third post has been split into what will now be the third and fourth posts).
- The fourth post will cover problems with certain types of asymptotic reasoning and introduce the concept of capabilities externalities.
- The fifth post will serve as a supplement to Unsolved Problems in ML Safety. Unlike that paper, we will explicitly discuss the existential risk motivations behind each of the areas we advocate.
- The sixth and final post will focus on tips for how to conduct good research and navigate the research landscape.
- A supplement to this sequence is X-Risk Analysis for AI Research.
Mechanistic Interpretability
“Keeping AI under control through mechanistic interpretability” Speaker: Prof. Max Tegmark (MIT)
Learn More: MIT Mechanistic Interpretability Conference 2023
MIT Department of Physics: The Impact of chatGPT talks (2023)
Provably Safe AGI
Steve Omohundro presentation at the MIT Mechanistic Interpretability Conference 2023
Learn More: steveomohundro.com
The slides are available here
ALIGNMENT MAP. The Future of Life Institute (2023)
FLI Value Alignment Research Landscape: Security. Control. Foundations. Governance. Ethics. Verification. Validation.
The project of creating value-aligned AI is perhaps one of the most important things we will ever do. However, there are open and often neglected questions regarding what is exactly entailed by ‘beneficial AI.’ Value alignment is the project of one day creating beneficial AI and has been expanded outside of its usual technical context to reflect and model its truly interdisciplinary nature. For value-aligned AI to become a reality, we need to not only solve intelligence, but also the ends to which intelligence is aimed and the social/political context, rules, and policies in and through which this all happens. This landscape synthesizes a variety of AI safety research agendas along with other papers in AI, machine learning, ethics, governance, and AI safety, robustness, and beneficence research. It lays out what technical research threads can help us to create beneficial AI, and describes how these many topics tie together.
INTELLIGENCE EXPLOSION (1965) and NOTE ON CONFINEMENT (1973)
Speculations Concerning the First Ultraintelligent Machine I.J. Good
“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion’, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.” – I.J. Good
A Note on the Confinement Problem Lampson
‘…the problem of confining a program during its execution so that it cannot transmit information to any other program except its caller. …We want to be able to confine an arbitrary program…. any program, if confined, will be unable to leak data. A misbehaving program may well be trapped as a result of an attempt to escape’
To address the Confinement Problem Lampson introduced the Laws of Confinement:
1) Total isolation: A confined program shall make no calls on any other program.
2) Transitivity: If a confined program calls another program which is not trusted, the called program must also be confined.
3) Masking: A program to be confined must allow its caller to determine all its inputs into legitimate and covert channels.
4) Enforcement: The supervisor must ensure that a confined program’s input to covert channels conforms to the caller’s specifications.
OpenAI (2023)
We are improving our AI systems’ ability to learn from human feedback and to assist humans at evaluating AI. Our goal is to build a sufficiently aligned AI system that can help us solve all other alignment problems.
Our alignment research aims to make artificial general intelligence (AGI) aligned with human values and follow human intent. We take an iterative, empirical approach: by attempting to align highly capable AI systems, we can learn what works and what doesn’t, thus refining our ability to make AI systems safer and more aligned. Using scientific experiments, we study how alignment techniques scale and where they will break.
We tackle alignment problems both in our most capable AI systems as well as alignment problems that we expect to encounter on our path to AGI. Our main goal is to push current alignment ideas as far as possible, and to understand and document precisely how they can succeed or why they will fail. We believe that even without fundamentally new alignment ideas, we can likely build sufficiently aligned AI systems to substantially advance alignment research itself.
Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together. Therefore we are committed to openly sharing our alignment research when it’s safe to do so: We want to be transparent about how well our alignment techniques actually work in practice and we want every AGI developer to use the world’s best alignment techniques.
At a high-level, our approach to alignment research focuses on engineering a scalable training signal for very smart AI systems that is aligned with human intent. It has three main pillars:
- Training AI systems using human feedback
- Training AI systems to assist human evaluation
- Training AI systems to do alignment research
Aligning AI systems with human values also poses a range of other significant sociotechnical challenges, such as deciding to whom these systems should be aligned. Solving these problems is important to achieving our mission, but we do not discuss them in this post.
We want to be transparent about how well our alignment techniques actually work in practice and we want every AGI developer to use the world’s best alignment techniques.
ON CONTAINMENT, ALIGNMENT, EXISTENTIAL RISK, BILL OF RIGHTS, REGULATIONS…
- Artificial Intelligence Safety and Security (Artificial Intelligence and Robotics Series) by Roman V. Yampolskiy (Editor)
- Guidelines for Artificial Intelligence Containment (Yampolskiy et al, 2017)
- The Nature of Self-Improving Artificial Intelligence (Omohundro, 2007)
- Leakproofing the Singularity (Yampolskiy, 2012)
- Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures (Yampolskiy, 2015)
- Thinking Inside the Box: Controlling and Using an Oracle AI (Stuart Armstrong Anders Sandberg Nick Bostrom, 2012)
- Superintelligence. Paths. Dangers. Strategies. (Bostrom, 2014) Superintelligence reading group & Capability control methods (KatjaGrace)
- Research Priorities for Robust and Beneficial Artificial Intelligence (Russell, Dewey and Tegmark, 2015)
- The AGI Containment Problem (Babcock, Kramar and Yampolskiy, 2016)
- Concrete Problems in AI Safety (Amodei et al., 2016)
- Strategic Implications of Openness in AI Development (Nick Bostrom, Future of Humanity Institute, 2017)
- On the Future – Prospects for Humanity (Rees, 2018)
- AGI safety from first principles (Ngo, 2020)
- AGI Ruin: A List of Lethalities (Yudkowsky, 2022) Summary (McAleese)
- Ethical and social risks of harm from Language Models (Weidinger et al. Deepmind, 2021)
- On the Opportunities and Risks of Foundation Models (Bommasani et al., 2022)
- A Mechanistic Interpretability Analysis of Grokking (Nanda and Lieberum, 2022)
- X-Risk Analysis for AI Research (Hendrycks and Mazeika, 2022)
- New Research: Advanced AI may tend to seek power *by default*. Edouard Harris on AI safety and the risks that come with advanced AI (2022)
- The Alignment Problem from a Deep Learning Perspective (Ngo, Chan and Mindermann, 2023)
- OpenAI. Aligning language models to follow instructions (Lowe and Leike, 2023)
- How Not To Destroy the World With AI (Russell, 2023)
- Governance of superintelligence (Altman, Brockman and Sutskever, 2023)
- Natural Selection Favors AIs over Humans (Hendrycks, 2023)
- Let’s Verify Step by Step and Improving Mathematical Reasoning with Process Supervision (OpenAI blog, 2023)
- GPT-4 System Card (OpenAI, March 23, 2023)
- LLaMA-2 from the Ground Up. Everything you need to know about the best open-source LLM on the market. (2023)
- Artificial Intelligence Risk Management Framework (AI RMF 1.0) (NIST, 2023)
- A European approach to artificial intelligence (EU, 2023) PDF
- GOV UK. Policy paper. A pro-innovation approach to AI regulation (2023)
- BLUEPRINT FOR AN AI BILL OF RIGHTS. MAKING AUTOMATED SYSTEMS WORK FOR THE AMERICAN PEOPLE (The White House, 2023)
- Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE (ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION LEGISLATIVE ACTS (EU, 2023)
- ANNEXES to the Proposal for a Regulation of the European Parliament and of the Council LAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE (ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION LEGISLATIVE ACTS (EU, 2023)
- AI Alignment and AI Safety and Existential risk from artificial general intelligence and Technological singularity and Artificial intelligence (Wikipedia, 2023)
- Mechanistic Interpretability Quickstart Guide – Lesswrong (2023)
- Right on Track: NVIDIA Open-Source Software Helps Developers Add Guardrails to AI Chatbots. NeMo Guardrails helps enterprises keep applications built on large language models aligned with their safety and security requirements. (2023)
- AI Deception: When Your Artificial Intelligence Learns to Lie. We need to understand the kinds of deception an AI agent may learn on its own before we can start proposing technological defenses – IEEE Spectrum (2023)
- A Mechanistic Interpretability Analysis of Grokking – Lesswrong – more links (2022/23)
- Tracing How LangChain Works Behind the Scenes – Kelvin Lu
- Opinion | We Need a Manhattan Project for AI Safety. AI presents an enormous threat. It deserves an enormous response. – Politico (2023)
- What Is ChatGPT Doing … and Why Does It Work? Stephen Wolfram
- Discovering Language Model Behaviors with Model-Written Evaluations. RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. (2023)
- Fundamental Limitations of Alignment in Large Language Models (2023)
- Generative Language Modeling for Automated Theorem Proving (2020)
- HyperTree Proof Search (HTPS) for Neural Theorem Proving (2023)
- Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. Using a sparse autoencoder, we extract a large number of interpretable features from a one-layer transformer. (2023)
- Towards Understanding Grokking: An Effective Theory of Representation Learning (2023)
- Probabilistic Programming (Cornell University lecture)
- Anthropomorphic reasoning about neuromorphic AGI safety (2016) – PDF
“Language models have become more capable and more broadly deployed, but our understanding of how they work internally is still very limited. For example, it might be difficult to detect from their outputs whether they use biased heuristics or engage in deception. Interpretability research aims to uncover additional information by looking inside the model.” – OpenAI, Language models can explain neurons in language models
Learn more at Future of Life Institute
- Artificial Intelligence. From recommender algorithms to chatbots to self-driving cars, AI is changing our lives. As the impact of this technology grows, so will the risks.
- Benefits & Risks of Artificial Intelligence
Featured posts on Artificial Intelligence
Can we rely on information sharing?
We have examined the Terms of Use of major General-Purpose AI system developers and found that they fail to provide assurances about the quality, reliability, and accuracy of their products or services.
October 26, 2023Written Statement of Dr. Max Tegmark to the AI Insight Forum
The Future of Life Institute President addresses the AI Insight Forum on AI innovation and provides five US policy recommendations.October 24, 2023As Six-Month Pause Letter Expires, Experts Call for Regulation on Advanced AI Development
This week will mark six months since the open letter calling for a six month pause on giant AI experiments. Since then, a lot has happened. Our signatories reflect on what needs to happen next.September 21, 2023Characterizing AI Policy using Natural Language Processing
As interest in Artificial Intelligence (AI) grows across the globe, governments have focused their attention on identifying the soft and […]December 16, 2022Superintelligence survey
Click here to see this page in other languages: Chinese French German Japanese Russian The Future of AI – What Do You Think? Max […]August 15, 2017A Principled AI Discussion in Asilomar
The Asilomar Conference took place against a backdrop of growing interest from wider society in the potential of artificial intelligence […]January 18, 2017Introductory Resources on AI Safety Research
Reading list to get up to speed on the main ideas in the field. The resources are selected for relevance and/or brevity, […]February 29, 2016AI FAQ
Frequently Asked Questions about the Future of Artificial Intelligence Click here to see this page in other languages: Chinese German Japanese Korean […]October 12, 2015