A very good read from a very respected source!!!
Meanwhile, every day, our p(doom) goes up and up and up…
Jaan Tallinn: As of July 2024, my top priorities for reducing existential risks from Al are these.
x-risk spiel
-
priorities
As of July 2024, my top priorities are about reducing humanity’s risk of destroying itself with AI over the next decade or two. I wish more people would “wake up” to this issue, and develop and express more informed opinions about it, since literally everyone under the age of ~60 is personally at risk. If more people demand the right to understand this risk and speak up about it, then we and our children will all stand a better chance of surviving — and hopefully flourishing — with AI technology.
I’ve sorted the priorities into two categories: restrictive efforts to prevent harmful patterns in AI development, and constructive efforts to set positive examples for how AI technologies should be developed and used.
Restrictive Efforts
- Datacenter Certifications: To train or operate AI systems, datacenters large enough to qualify as high risk should be required to obtain safety and security certifications that constrain both training and operation. These requirements should include immutable audit trails and be upheld by international alliances and inspections. Numerical thresholds defining high risk datacenters should likely be reduced over time. Proofs of safety should be required for high risk training runs and deployments, using either formal mathematics or high-confidence fault tree analysis.
Discussion
Problem being addressed:
Powerful AI technologies present risks to humanity both during training and during operation, with the potential to yield rogue AI (Bengio, 2023). During training, scientists could lose control of a system as early as training time if it learns to copy itself, manipulate its operators, or otherwise break out of a datacenter, or if hackers penetrate the datacenter and steal it. During operation (“runtime”), loss of control is again a risk, as well as direct harms to society.
Runtime policies for the use of AI post-training are also important. After a datacenter is used to train a powerful AI system, running that system is typically much less expensive, allowing hundreds or thousands of copies of the system to be run in parallel. Thus, additional policies are needed to govern how and when AI systems are run or deployed.
Over time, as more efficient algorithms enable super-human advantages with fewer and fewer computing resources, smaller datacenters will present larger risks of spawning rogue AI. Thus, thresholds defining “high risk” datacenters should be lowered over time, unless high-confidence countermeasures emerge to defend against rogue AI and mitigate these risks.
Why this approach:
Datacenters are physical objects that are relatively easy to define and track, presenting one of several complimentary triggers for attending to AI risk. Larger datacenters can train and host more powerful AI systems, thus datacenter capacities present natural criteria for oversight.
State of progress:
As of the United States Executive Order released on October 30, 2023, the US executive branch is evidently attending to datacenter capacities as risk factors in AI safety. The order instated reporting requirements for “any computing cluster that has a set of machines physically co-located in a single datacenter, transitively connected by data center networking of over 100 Gbit/s, and having a theoretical maximum computing capacity of 10^20 integer or floating-point operations per second for training AI.”
Still, these numbers should probably be reduced in years to come. One can estimate human brain performance at roughly 10^16 floating point operations per second: 10^11 neurons * 1000 synapses/neuron * 1 float/synapse * 100 operations/second. So, a datacenter capable of 10^20 FLOP/s could in principle simulate hundreds or thousands of human-level scientists, which together could present a significant risk to the public. This relates to the next agenda item.
Also, certification regimes are still needed in other countries besides the US.
- Speed Limits: AI systems and their training protocols should have compute usage restrictions, including speed limits(measured in bit operations per second) to prevent getting out of control. The limits should be mandated by datacenter certifications (above) as well as liability laws (below).
Discussion
Problem being addressed:
Without internationally enforced speed limits on AI, humanity is unlikely to survive. If AI is not speed-limited, by the end of this decade humans will look more like plants than animals from the perspective of AI systems: big slow chunks of biofuel showing weak signs of intelligence when left undisturbed for ages (seconds) on end. Here is what humans look like from the perspective of a system just 50x faster than us:
Alarmingly, over the next decade AI can be expected to achieve a 100x or even 1,000,000x speed advantage over us. Why?
Human neurons fire at a rate of around 100 Hz, while computer chips “fire” at rates measured in GHz: tens of millions of times faster than us. Current AI has not been distilled to run maximally efficiently, but will almost certainly run 100x faster than humans eventually, and 1,000,000x seems achievable in principle.
As a counterpoint, one might think AI will decide to keep humans around, the same way humans have decided to protect or cultivate various species of plants and fungus. However, most species have not been kept around: around 99.9% of all species on Earth have gone extinct (source: Wikipedia), and the current mass extinction period — the anthropocene extinction, which started around 10,000 years ago with the rise of human civilization — is occurring extremely quickly in relation to the history of the other species involved. Unless we make a considerable effort to ensure AI will preserve human life, the opposite should be assumed by default.
Why this approach:
Speed is a key determining factor in many forms of competition, and speed limits are a simple concept that should be broadly understandable and politically viable as a means of control. Speed limits already exist to protect human motorists and aircraft, to regulate internet traffic, and to protect wildlife and prevent erosion in nature reserves. Similarly, speed limits on how fast AI systems are allowed to think and act relative to humans could help to protect humans and human society from being impacted or gradually “eroded” by AI technology. For instance, if a rogue AI begins to enact a dangerous plan by manipulating humans, but slowly enough for other humans to observe and stop it, we are probably much safer than if the AI is able to react and adjust its plans hundreds of times faster than us.
State of progress:
Presently, no country has imposed a limit on the speed at which an AI system may be operated. As discussed above under Datacenter Certifications, the US executive branch is at least attending to the total speed capacity of a datacenter — measured in floating point operations per second — as a risk factor warranting federal certification and oversight. However, there are three major gaps:
Gap #1: There is no limit on how fast an AI system will be allowed to operate under federal supervision, and thus no actual speed limit is in force in the US.
Gap #2: After training an AI system, typically the computing resources needed to train it are sufficient to run hundreds or thousands of copies of it, collectively yielding a much larger total speed advantage than observed during training. Thus, perhaps stricter speed limits should exist at runtime than at training time.
Gap #3: Other countries besides the US have yet to publicly adopt any speed-related oversight policies at all (although perhaps it won’t be long until the UK will adopt some).
- Liability Laws: Both the users and developers of AI technology should be held accountable for harms and risks produced by AI, including “near-miss” incidents. There should be robust whistleblower protections to discourage concealment of risks from institutions. (See also this FLI position paper.) To enable accountability, it should be illegal to use or distribute AI from an unattributed source. Private rights of action should empower individuals and communities harmed or placed at risk by AI to take both the users and developers of the AI to court.
Discussion
Problem being addressed:
Normally, if someone causes significant harm or risk to another person or society, they are held accountable for that. For instance, if someone releases a toxin into a city’s water supply, they can be punished for the harm or risk of harm to the many people in that city.
Currently, the IT industry operates with much less accountability in this regard, as can be seen from the widespread mental health difficulties caused by novel interactions enabled by social media platforms. Under US law, social media platforms have rarely been held accountable for these harms, under Section 230.
The situation with AI so far is not much better. Source code and weights for cutting edge AI systems are often shared without restriction, and without liability for creators. Sometimes, open source AI creations have many contributors, many of whom are anonymous, further limiting liability for the effects of these technologies. As a result, harms and risks proliferate with little or no incentive for the people creating them to be more careful.
Why this approach:
Since harms from AI technology can be catastrophic, risks must be penalized alongside actualized harms, to prevent catastrophes from ever occurring rather than only penalizing them after the fact.
Since developers and users both contribute to harms and risks, they should both be held accountable for them. Users often do not understand the properties of the AI they are using, and developers often do not understand the context in which their AI will be applied, so it does not make sense to place full accountability on users nor on developers.
As open source AI development advances, there could be many developers behind any given AI technology, and they might be anonymous. If a user uses an AI system that is not traceable to an accountable developer, they degrade society’s ability to attribute liability for the technology, and should be penalized accordingly.
Financial companies are required to “know their customer”, so this principle should be straightforward to apply in the AI industry as well. But in the case of open source development, a “know your developer” principle is also needed, to trace accountability in the other direction.
State of progress:
Currently, “Know Your Developer” laws do not exist for AI in any country, and do not appear to be in development as yet. AI-specific “Know Your Customer” laws at least appear under development in the US, as the October 2023 US Executive Order requires additional record-keeping for “Infrastructure as as Service” (IaaS) companies when serving foreign customers. These orders do not protect Americans from harm or risks from AI systems hosted entirely by foreign IaaS companies, although there is an expressed intention to “develop common regulatory and other accountability principles for foreign nations, including to manage the risk that AI systems pose.”
- Labeling Requirements: The United Nations should declare it a fundamental human right to know whether one is interacting with another human or a machine. Designing or allowing an AI system to deceive humans into believing it is a human should be declared a criminal offense, except in specially licensed contexts for safety-testing purposes. Content that is “curated”, “edited”, “co-authored”, “drafted”, or “wholly generated” by AI should be labeled accordingly, and the willful or negligent dissemination of improperly labeled AI content should also be a criminal offense.
Discussion
Problem being addressed:
Since AI might be considerably more intelligent than humans, we may be more susceptible to manipulation by AI systems than humans. As such, humans should be empowered to exercise a greater degree of caution when interacting with AI.
Also, if humans are unable to distinguish other humans from AI systems, we will become collectively unable to track the harms and benefits of technology to human beings, and also unable to trace human accountability for actions.
Why this approach:
Labeling AI content is extremely low cost, and affords myriad benefits.
State of progress:
The October 2023 US Executive Order includes in Section 4.5 intentions to develop “standards, tools, methods, and practices” for labeling, detecting, and tracking the provenance of synthetic content. However, little attention is given to the myriad ways AI can be used to create content without wholly generating it, such as through curation, editing, co-authoring, or drafting. Further distinctions are needed to attend to these cases.
- Veto Committees: Any large-scale AI risk, like a significant deployment or large training run, should only be taken with the unanimous consent of a committee that is broadly representative of the human public. This committee should be selected through a fair and principled process, such as a sortition algorithm with adjustments to prevent unfair treatment of minority groups and disabled persons.
Discussion
Problem being addressed:
Increasingly powerful AI systems present large-scale risks to all of humanity, even including extinction risk. It is deeply unfair to undertake such risks without consideration for the many people who could be harmed by it. Also, while AI presents many benefits, if those benefits accrue only to a privileged class of people, risks to persons outside that class are wholly unjustified.
Why this approach:
It is very costly to defer every specific decision to the entire public, except if technology is used to aggregate public opinion in some manner, in which case the aggregation technology should itself be subject to a degree of public oversight. It is more efficient to choose representatives from diverse backgrounds to stand in protection of values that naturally emerge from those backgrounds.
However, equal representation is not enough. Due to patterns of coalition-formation based on salient characteristics such as appearance or language (see Schelling Segregation), visible minority groups will systematically suffer disadvantages unless actively protected from larger groups.
Finally, individuals suffering from disabilities are further in need of protection, irrespective of their background.
State of progress:
The October 2023 US Executive Order expresses intentions for the US Federal Government “identifying and circulating best practices for agencies to attract, hire, retain, train, and empower AI talent, including diversity, inclusion, and accessibility best practices”.
However, these intentions are barely nascent, and do not as yet seem on track to yield transparent and algorithmically principled solutions, which should be made a high priority.
- Global Off-Switches: Humanity should collectively maintain the ability to gracefully shut down AI technology at a global scale, in case of emergencies caused by AI. National and local shutdown capacities are also advisable, but not sufficient because AI is so easily copied across jurisdictions. “Fire drills” to prepare society for local and global shutdown events are needed to reduce harm from shutdowns, and to maintain willingness to use them.
Discussion
Problem being addressed:
It is normal for a company to take its servers offline in order to perform maintenance. Humanity as a whole needs a similar capacity for all AI technology. Without the ability to shut down AI technology, we remain permanently vulnerable to any rogue AI system that surpasses our mainline defenses. Without practice using shutdowns, we will become over-dependent on AI technology, and unable to even credibly threaten to shut it down.
Why this approach:
This approach should be relatively easy to understand by analogy with “fire drills” practiced by individual buildings or cities prone to fire emergencies, since rogue AI, just like fires, would be capable of spreading very quickly and causing widespread harm.
There is certainly a potential for off-switches to be abused by bad actors, and naturally measures would have to be taken to guard against such abuse. However, the argument “Don’t create off-switches because they could be abused” should not be a crux, especially since the analogous argument, “Don’t create AGI because it could be abused”, is currently not carrying enough weight to stop AGI development.
State of progress:
Nothing resembling global or even local AI shutdown capacities exist to our knowledge, except for local shutdown capacities via power outages or electromagnetic interference with electronics. Even if such capacities exist in secret, they are not being practiced, and a strong commercial pressure exists for computing infrastructure companies to maximize server uptimes to near 100% (see the Strasbourg datacenter fire of 2021). Thus society is unprepared for AI shutdown events, and as a result, anyone considering executing a national-scale or global shutdown of AI technology will face an extremely high burden of confidence to justify their actions. This in turn will result in under-use and atrophy of shutdown capacities.
Constructive Efforts
- Collective Intelligence: In lieu of superintelligent machines replacing humanity, I’d like to see humanity itself becoming collectively superintelligent as a species. Interoperable AI networks should be developed that assist humans in reaching positive-sum agreements. Today’s language models can be deployed as tools to accelerate this process. Even AI governance decisions can benefit from interoperable AI models that assist and mediate group decision-making. For an inspiring vision in this area, see this TED Talk by Divya Siddarth at the Collective Intelligence Project.
- AI Healthtech: AI has a tremendous potential to save lives, and I want to be part of that. First of all, saving lives is intrinsically rewarding. Second, the healthcare industry is an excellent setting for building generalizable caring and caretaking capacities for AI. Third, the more we can save human lives with a given generation of AI models, the less morally compelling it will be to risk developing more powerful models that might be harder to control. Finally, health in general is a relatively geopolitically stabilizing objective for companies and nations to pursue with AI technology, by comparison to other industries like aerospace and defense. To help support AI healthtech development, I’ve joined HealthcareAgents as a co-founder.
- Protective moralities: I want to support morally motivated initiatives that, by symmetry, might increase humanity’s chances of being treated well by advanced AI even if we no longer directly control it. Examples include freedom and sovereignty for individuals and territories (example), mercy toward other species (example), fair allocation of resources (example), cooperativity (see collective intelligence above), and caring and caretaking toward others (see AI healthtech above). These are abstract moral objectives that, should they end up applying to AI systems, might be somewhat protective of humanity as a special case.
- Guaranteed-safe AI: Most technologies depend crucially on safety specs in order to function as intended, including electricity, motors, foods, drugs, cars, and airplanes — AI should be no different. If AI software is developed with human-legible quantitative safety guarantees, it could be a huge win for the human economy, by allowing humans to safely construct more infrastructure with reliable AI components (as long as those humans also behave responsibly!). For an outline of potential approaches, see Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems.
- Hardware-level controls: It may be possible to prevent AI systems from “going rogue” in various ways with the help of cleverly designed hardware mechanisms for shutting down and/or reporting on dangerous behaviors. See this memo on flexible hardware-enabled [chip] governors or “Flex-HEGs” for a detailed approach. Designing and building such hardware is a constructive effort with many applications, even if those applications will often be restrictive in some way (see the next section).
online resources
- Reasoning through arguments against taking AI safety seriously (2024) — yoshua bengio’s counter to common arguments for dismissing AI safety as important;
- Can Humanity Survive AI? (2024) — great jacobin article about the state of the AI x-risk discourse;
- Don’t Worry About the Vase (2023+) — zvi mowshowitz’s weekly (or so) overviews about the AI news and associated discourse (among other topics);
- My techno-optimism (2023) — vitalik buterin’s vision about how to combine the virtues of technological progress and the need to steer the future;
- Shallow review of live agendas in alignment & safety (2023) — overview of the AI safety research as of 2023;
- How Rogue AIs may Arise (2023) — an article by yoshua bengio about how we could lose control over powerful AIs;
- Ilya: the AI scientist shaping the world (2023) — short documentary about ilya sutskever and his views about catastrophic AI risk;
- Don’t Look Up – The Documentary (2023) — a well done 17 minute documentary summarising the AI xrisk arguments as of may 2023;
- The AI Dilemma (2023) — tristan harris and aza raskin describe in excruciating detail the pickle humanity has gotten itself into with AI;
- We must slow down the race to God-like AI (2023) — ian hogarth on the current LLM suicide race (archive link);
- The ’Don’t Look Up’ Thinking That Could Doom Us With AI (2023) — max tegmark cataloguing bad arguments for ignoring the danger of superintelligence;
- This Changes Everything (2023) — ezra klein’s take on how the leading AI labs are gambling with literally everyone’s lives now (archive link);
- The case for taking AI seriously as a threat to humanity (2018) — popular explanation of the existential risk from AI;
- Benefits & Risks of Artificial Intelligence (2016) — FLI’s comprehensive overview and link collection about AI risk;
- How to pursue a career in technical AI alignment (2022) — guide for people who are considering direct work on technical AI alignment;
- Overview of Technical Alignment Landscape (2022) — a snapshot of what various AI alignment organisations were working on as of 2022;
- Annotated bibliography of recommended materials (2016) — CHAI’s reading list;
- AI Research Considerations for Human Existential Safety (ARCHES) (2020) — a thorough survey article of research that’s relevant to reducing human extinction risks from advanced AI;
- AI Alignment Forum — main online forum for technical AI safety research;
- The Importance of AI Alignment, explained in 5 points (2023) — a crisp semi-technical case for AI alignment;
- Alignment Research Field Guide (2019) — a guide to AI alignment research;
- AGI Safety Fundamentals (2023) — a curriculum to teach AI alignment fundamentals;
- Alignment Newsletter — email newsletter monitoring progress in AI safety research (dormant as of 2023);
- Cold Takes — a blog by holden karnofsky explaining the “longtermist mindset”;
- The “most important century” series (2021) — an argument why this century is special;
- Slate Star Codex — general commentary on AI (among other topics) by scott alexander;
- Meditations on Moloch (2014) — a really good (albeit long) essay about the greater class of coordination problems humanity is facing, of which technological x-risks are just one instance;
- The Demiurge’s Older Brother (2017) — a fun fictional story about superintelligent AIs reasoning about each other.
books
- Brian Christian. The Alignment Problem (2020) — a clearly written overview of existing and near(er) term problems with mis-aligned AI that the field has been grappling with;
- Toby Ord. The Precipice (2020) — a thorough analysis of species-wide risks facing humanity in the coming century;
- Stuart Russell. Human Compatible (2019) — well-argued treatise of AI alignment problem along with suggestions how to make the field of AI safer by the co-author of the most popular AI textbook;
- Max Tegmark. Life 3.0 (2017) — an accessible overview of our long-term future with AI that also recounts FLI’s history and related efforts;
- Nick Bostrom. Superintelligence (2014) — the granddaddy of books about long term AI risk.
my own views
- Oxford Union Debate (2023) — arguing at the oxford union that the near-term x-risk from rogue AI is under-appreciated and the counter-arguments to that are very poor;
- Artificial Intelligence and You (2023) — interview with peter scott;
- The Logan Bartlett Show (2023) — a bit broader interview, also discussing Skype and crypto in addtion to AI;
- The Cognitive Revolution Podcast (2023) — discussion of the FLI open letter and beyond with nathan labenz (highlights by liron shapira);
- World of DaaS podcast (2023) — a short (27 minute) chat with auren hoffman;
- Tallinn Digital Summit (2022) — a “fireside chat” at Tallinn Digital Summit 2022;
- FLI Podcast (2021) — a conversation with lucas perry of the future of life institute;
- Topos Institute fireside chat (2021) — a conversation with davidad about the intersection of AI alignment and category theory;
- Palladium Magazine Digital Salon (2020) — an online interview with thoughtful questions;
- Manifold Podcast (2020) — a fun interview with steve hsu;
- The Guardian (2019) — a “general interest” story (originally published in Popular Science);
- Financial Times Tech Tonic podcast (2019) — a short (22 minute) interview from 2019;
- Toy Model of The Control Problem (2016) — slides from a talk where i used a simple gridworld to illustrate the AI control problem (and related problems);
- Edge.org (2015) — an interview from january 2015 covering my views re the AI-risk and x-risks in general;
- A playlist of my talks and interviews on Youtube.
organisations
- FLI: the first academic x-risk organisation in the US. presided by my good friend max tegmark who is a professor at MIT and has relevant experience running FQXi. i’m one of the 5 co-founders myself. FLI made a splash in 2015 by organising an impressive AI risk conference in puerto rico and, then again in 2023 with the “six-month pause letter“;
- CAIS: SF-based AI safety research and advocacy organisation behind the 2023 statement on AI risk; i joined their board in 2024;
- CSER: an x-risk organisation at the cambridge university that i co-founded with professor huw price and lord martin rees (who was the master of the trinity college at the time). initially its main contribution was to lend credibility to the x-risks by associating a long list of high profile scientists with the cause (plus publishing articles and giving talks). later CSER has contributed to (necessarily academic) research in the x-risk and catastrophic risks domain;
- METR: the leading “AI evals” organisation;
- Apollo Research: european AI evals organisation;
- MIRI: the oldest x-risk organisation (started in 2000 by eliezer yudkowsky as singularity institute) that, after selling the “singularity” brand and the singularity summit conference, has concentrated on solving various math problems that prevent us from developing powerful AI agents that would behave in predictable manner;
- CFAR: another offshoot of singularity institute, providing rationality training. i’ve attended their workshop myself and i’m now sponsoring young estonian science olympiad winners to attend their workshops. CFAR also has a “SPARC” workshop that’s aimed to young math talent — the goal being to introduce upcoming mathematicians to rationality, the x-risk issues and potentially involve them in the kind of work MIRI is doing;
- FHI: the oldest x-risk organisation in academia. established in 2005 by the oxford philosopher nick bostrom whose latest book (“superintelligence”) has caused a big shift in the AI risk discourse. given the importance and diversity of their research, i consider FHI to be one of the best places on the planet for interesting and profound discussions;
- AI Impacts: a cross between MIRI and FHI started by a longtime SIAI/MIRI/FHI researcher katja grace. famous for their widely cited AI expert surveys;
- CFI — leverhume centre for the future of intelligence, the research organisation co-founded by CSER with the £10M grant from leverhulme foundation;
- 80k Hours: an organisation within the effective altruism movement that’s doing career coaching and advice for young people who are interested in maximising their contribution to the future. the nice thing about EA is that it is finally a working solution to the “tragedy of the commons” problem that the x-risk reduction is plagued with: there is no way to profit from x-risk reduction. however, once people actually start doing calculations in terms of how to maximise the impact from altruistic activities, they quite reliably end up supporting x-risk reduction. in my view, the main value that 80kh brings to the table is to potentially address the “talent bottleneck” in the x-risk research by directing more young people to it;
- GCRI: a small org in new york, doing academic research into catastrophic risks;
- CHAI: stuart russell’s centre at berkeley unversity. stuart russell is the co-author of the leading AI textbook and has become the leading advocate in academia for the long-term AI safety;
- SFF: a fund that’s helping me to do most of my philanthropy these days (in that role, SFF is a successor to BERI).
see also larks’ annual AI alignment literature reviews for more detailed overviews of the work the AI alignment ecosystem produces.
- Datacenter Certifications: To train or operate AI systems, datacenters large enough to qualify as high risk should be required to obtain safety and security certifications that constrain both training and operation. These requirements should include immutable audit trails and be upheld by international alliances and inspections. Numerical thresholds defining high risk datacenters should likely be reduced over time. Proofs of safety should be required for high risk training runs and deployments, using either formal mathematics or high-confidence fault tree analysis.