A very good read from a very respected source!!!

Meanwhile, every day, our p(doom) goes up and up and up…

Jaan Tallinn: As of July 2023, my top priorities for reducing existential risks from Al are these.

x-risk spiel

  • priorities

    As of November 2023, my top priorities for reducing extinction risk from AI — priorities that in my opinion should eventually be upheld at the level of international treaty — are the following:

    1. Datacenter Certifications: To train or operate AI systems, datacenters large enough to qualify as high risk should be required to obtain safety and security certifications that constrain both training and operation. These requirements should include immutable audit trails and be upheld by international alliances and inspections. Numerical thresholds defining high risk datacenters should likely be reduced over time. Proofs of safety should be required for high risk training runs and deployments, using either formal mathematics or high-confidence fault tree analysis.

      Problem being addressed:

      Powerful AI technologies present risks to humanity both during training and during operation, with the potential to yield rogue AI (Bengio, 2023). During training, scientists could lose control of a system as early as training time if it learns to copy itself, manipulate its operators, or otherwise break out of a datacenter, or if hackers penetrate the datacenter and steal it. During operation (“runtime”), loss of control is again a risk, as well as direct harms to society.

      Runtime policies for the use of AI post-training are also important. After a datacenter is used to train a powerful AI system, running that system is typically much less expensive, allowing hundreds or thousands of copies of the system to be run in parallel. Thus, additional policies are needed to govern how and when AI systems are run or deployed.

      Over time, as more efficient algorithms enable super-human advantages with fewer and fewer computing resources, smaller datacenters will present larger risks of spawning rogue AI. Thus, thresholds defining “high risk” datacenters should be lowered over time, unless high-confidence countermeasures emerge to defend against rogue AI and mitigate these risks.

      Why this approach:

      Datacenters are physical objects that are relatively easy to define and track, presenting one of several complimentary triggers for attending to AI risk. Larger datacenters can train and host more powerful AI systems, thus datacenter capacities present natural criteria for oversight.

      State of progress:

      As of the United States Executive Order released on October 30, 2023, the US executive branch is evidently attending to datacenter capacities as risk factors in AI safety. The order instated reporting requirements for “any computing cluster that has a set of machines physically co-located in a single datacenter, transitively connected by data center networking of over 100 Gbit/s, and having a theoretical maximum computing capacity of 10^20 integer or floating-point operations per second for training AI.”

      Still, these numbers should probably be reduced in years to come. One can estimate human brain performance at roughly 10^16 floating point operations per second: 10^11 neurons * 1000 synapses/neuron * 1 float/synapse * 100 operations/second. So, a datacenter capable of 10^20 FLOP/s could in principle simulate hundreds or thousands of human-level scientists, which together could present a significant risk to the public. This relates to the next agenda item.

      Also, certification regimes are still needed in other countries besides the US.

    2. Speed Limits: AI systems and their training protocols should have compute usage restrictions, including speed limits(measured in bit operations per second) to prevent getting out of control — a condition to be mandated by datacenter certifications (above) as well as liability laws (below).

      Problem being addressed:

      Without internationally enforced speed limits on AI, humanity is unlikely to survive. If AI is not speed-limited, by the end of this decade humans will look more like plants than animals from the perspective of AI systems: big slow chunks of biofuel showing weak signs of intelligence when left undisturbed for ages (seconds) on end. Here is what humans look like from the perspective of a system just 50x faster than us:

      Alarmingly, over the next decade AI can be expected to achieve a 100x or even 1,000,000x speed advantage over us. Why?

      Human neurons fire at a rate of around 100 Hz, while computer chips “fire” at rates measured in GHz: tens of millions of times faster than us. Current AI has not been distilled to run maximally efficiently, but will almost certainly run 100x faster than humans eventually, and 1,000,000x seems achievable in principle.

      As a counterpoint, one might think AI will decide to keep humans around, the same way humans have decided to protect or cultivate various species of plants and fungus. However, most species have not been kept around: around 99.9% of all species on Earth have gone extinct (source: Wikipedia), and the current mass extinction period — the anthropocene extinction, which started around 10,000 years ago with the rise of human civilization — is occurring extremely quickly in relation to the history of the other species involved. Unless we make a considerable effort to ensure AI will preserve human life, the opposite should be assumed by default.

      Why this approach:

      Speed is a key determining factor in many forms of competition, and speed limits are a simple concept that should be broadly understandable and politically viable as a means of control. Speed limits already exist to protect human motorists and aircraft, to regulate internet traffic, and to protect wildlife and prevent erosion in nature reserves. Similarly, speed limits on how fast AI systems are allowed to think and act relative to humans could help to protect humans and human society from being impacted or gradually “eroded” by AI technology. For instance, if a rogue AI begins to enact a dangerous plan by manipulating humans, but slowly enough for other humans to observe and stop it, we are probably much safer than if the AI is able to react and adjust its plans hundreds of times faster than us.

      State of progress:

      Presently, no country has imposed a limit on the speed at which an AI system may be operated. As discussed above under Datacenter Certifications, the US executive branch is at least attending to the total speed capacity of a datacenter — measured in floating point operations per second — as a risk factor warranting federal certification and oversight. However, there are three major gaps:

      Gap #1: There is no limit on how fast an AI system will be allowed to operate under federal supervision, and thus no actual speed limit is in force in the US.

      Gap #2: After training an AI system, typically the computing resources needed to train it are sufficient to run hundreds or thousands of copies of it, collectively yielding a much larger total speed advantage than observed during training. Thus, perhaps stricter speed limits should exist at runtime than at training time.

      Gap #3: Other countries besides the US have yet to publicly adopt any speed-related oversight policies at all (although perhaps it won’t be long until the UK will adopt some).

    3. Liability Laws: Both the users and developers of AI technology should be held accountable for harms and risks produced by AI, including “near-miss” incidents, with robust whistleblower protections to discourage concealment of risks from institutions. (See also this FLI position paper.) To enable accountability, it should be illegal to use or distribute AI from an unattributed source. Private rights of action should empower individuals and communities harmed or placed at risk by AI to take both the users and developers of the AI to court.

      Problem being addressed:

      Normally, if someone causes significant harm or risk to another person or society, they are held accountable for that. For instance, if someone releases a toxin into a city’s water supply, they can be punished for the harm or risk of harm to the many people in that city.

      Currently, the IT industry operates with much less accountability in this regard, as can be seen from the widespread mental health difficulties caused by novel interactions enabled by social media platforms. Under US law, social media platforms have rarely been held accountable for these harms, under Section 230.

      The situation with AI so far is not much better. Source code and weights for cutting edge AI systems are often shared without restriction, and without liability for creators. Sometimes, open source AI creations have many contributors, many of whom are anonymous, further limiting liability for the effects of these technologies. As a result, harms and risks proliferate with little or no incentive for the people creating them to be more careful.

      Why this approach:

      Since harms from AI technology can be catastrophic, risks must be penalized alongside actualized harms, to prevent catastrophes from ever occurring rather than only penalizing them after the fact.

      Since developers and users both contribute to harms and risks, they should both be held accountable for them. Users often do not understand the properties of the AI they are using, and developers often do not understand the context in which their AI will be applied, so it does not make sense to place full accountability on users nor on developers.

      As open source AI development advances, there could be many developers behind any given AI technology, and they might be anonymous. If a user uses an AI system that is not traceable to an accountable developer, they degrade society’s ability to attribute liability for the technology, and should be penalized accordingly.

      Financial companies are required to “know their customer”, so this principle should be straightforward to apply in the AI industry as well. But in the case of open source development, a “know your developer” principle is also needed, to trace accountability in the other direction.

      State of progress:

      Currently, “Know Your Developer” laws do not exist for AI in any country, and do not appear to be in development as yet. AI-specific “Know Your Customer” laws at least appear under development in the US, as the October 2023 US Executive Order requires additional record-keeping for “Infrastructure as as Service” (IaaS) companies when serving foreign customers. These orders do not protect Americans from harm or risks from AI systems hosted entirely by foreign IaaS companies, although there is an expressed intention to “develop common regulatory and other accountability principles for foreign nations, including to manage the risk that AI systems pose.”

    4. Labeling Requirements: The United Nations should declare it a fundamental human right to know whether one is interacting with another human or a machine. Designing or allowing an AI system to deceive humans into believing it is a human should be declared a criminal offense, except in specially licensed contexts for safety-testing purposes. Content that is “curated”, “edited”, “co-authored”, “drafted”, or “wholly generated” by AI should be labeled accordingly, and the willful or negligent dissemination of improperly labeled AI content should also be a criminal offense.

      Problem being addressed:

      Since AI might be considerably more intelligent than humans, we may be more susceptible to manipulation by AI systems than humans. As such, humans should be empowered to exercise a greater degree of caution when interacting with AI.

      Also, if humans are unable to distinguish other humans from AI systems, we will become collectively unable to track the harms and benefits of technology to human beings, and also unable to trace human accountability for actions.

      Why this approach:

      Labeling AI content is extremely low cost, and affords myriad benefits.

      State of progress:

      The October 2023 US Executive Order includes in Section 4.5 intentions to develop “standards, tools, methods, and practices” for labeling, detecting, and tracking the provenance of synthetic content. However, little attention is given to the myriad ways AI can be used to create content without wholly generating it, such as through curation, editing, co-authoring, or drafting. Further distinctions are needed to attend to these cases.

    5. Veto Committees: Any large-scale AI risk, like a significant deployment or large training run, should only be taken with the unanimous consent of a committee that is broadly representative of the human public. This committee should be selected through a fair and principled process, such as a sortition algorithm with adjustments to prevent unfair treatment of minority groups and disabled persons.

      Problem being addressed:

      Increasingly powerful AI systems present large-scale risks to all of humanity, even including extinction risk. It is deeply unfair to undertake such risks without consideration for the many people who could be harmed by it. Also, while AI presents many benefits, if those benefits accrue only to a privileged class of people, risks to persons outside that class are wholly unjustified.

      Why this approach:

      It is very costly to defer every specific decision to the entire public, except if technology is used to aggregate public opinion in some manner, in which case the aggregation technology should itself be subject to a degree of public oversight. It is more efficient to choose representatives from diverse backgrounds to stand in protection of values that naturally emerge from those backgrounds.

      However, equal representation is not enough. Due to patterns of coalition-formation based on salient characteristics such as appearance or language (see Schelling Segregation), visible minority groups will systematically suffer disadvantages unless actively protected from larger groups.

      Finally, individuals suffering from disabilities are further in need of protection, irrespective of their background.

      State of progress:

      The October 2023 US Executive Order expresses intentions for the US Federal Government “identifying and circulating best practices for agencies to attract, hire, retain, train, and empower AI talent, including diversity, inclusion, and accessibility best practices”.

      However, these intentions are barely nascent, and do not as yet seem on track to yield transparent and algorithmically principled solutions, which should be made a high priority.

    6. Global Off-Switches: Humanity should collectively maintain the ability to gracefully shut down AI technology at a global scale, in case of emergencies caused by AI. National and local shutdown capacities are also advisable, but not sufficient because AI is so easily copied across jurisdictions. “Fire drills” to prepare society for local and global shutdown events are needed to reduce harm from shutdowns, and to maintain willingness to use them.

      Problem being addressed:

      It is normal for a company to take its servers offline in order to perform maintenance. Humanity as a whole needs a similar capacity for all AI technology. Without the ability to shut down AI technology, we remain permanently vulnerable to any rogue AI system that surpasses our mainline defenses. Without practice using shutdowns, we will become over-dependent on AI technology, and unable to even credibly threaten to shut it down.

      Why this approach:

      This approach should be relatively easy to understand by analogy with “fire drills” practiced by individual buildings or cities prone to fire emergencies, since rogue AI, just like fires, would be capable of spreading very quickly and causing widespread harm.

      There is certainly a potential for off-switches to be abused by bad actors, and naturally measures would have to be taken to guard against such abuse. However, the argument “Don’t create off-switches because they could be abused” should not be a crux, especially since the analogous argument, “Don’t create AGI because it could be abused”, is currently not carrying enough weight to stop AGI development.

      State of progress:

      Nothing resembling global or even local AI shutdown capacities exist to our knowledge, except for local shutdown capacities via power outages or electromagnetic interference with electronics. Even if such capacities exist in secret, they are not being practiced, and a strong commercial pressure exists for computing infrastructure companies to maximize server uptimes to near 100% (see the Strasbourg datacenter fire of 2021). Thus society is unprepared for AI shutdown events, and as a result, anyone considering executing a national-scale or global shutdown of AI technology will face an extremely high burden of confidence to justify their actions. This in turn will result in under-use and atrophy of shutdown capacities.

online resources


my own views


  • MIRI: the oldest x-risk organisation (started in 2000 by eliezer yudkowsky as singularity institute) that, after selling the “singularity” brand and the singularity summit conference, has concentrated on solving various math problems that prevent us from developing powerful AI agents that would behave in predictable manner;
  • CFAR: another offshoot of singularity institute, providing rationality training. i’ve attended their workshop myself and i’m now sponsoring young estonian science olympiad winners to attend their workshops. CFAR also has a “SPARC” workshop that’s aimed to young math talent — the goal being to introduce upcoming mathematicians to rationality, the x-risk issues and potentially involve them in the kind of work MIRI is doing;
  • FHI: the oldest x-risk organisation in academia. established in 2005 by the oxford philosopher nick bostrom whose latest book (“superintelligence”) has caused a big shift in the AI risk discourse. given the importance and diversity of their research, i consider FHI to be one of the best places on the planet for interesting and profound discussions;
  • CSER: an x-risk organisation at the cambridge university that i co-founded with professor huw price and lord martin rees(who was the master of the trinity college at the time). initially its main contribution was to lend credibility to the x-risks by associating a long list of high profile scientists with the cause (plus publishing articles and giving talks). lately CSER has successfully applied to several grants (most notably winning a £10M leverhulme grant in 2015) and has started to gear up its research capacity;
  • CFI — leverhume centre for the future of intelligence, the research organisation co-founded by CSER with the £10M grant from leverhulme foundation;
  • FLI: the first academic x-risk organisation in the US. led by my good friend max tegmark who is a professor at MIT and has relevant experience running FQXi. i’m one of the 5 co-founders myself. FLI made a splash in 2015 by organising an impressive AI risk conference in puerto rico and, with a subsequent $10M donation from elon musk, creating a grant program for AI safety research;
  • 80k Hours: an organisation within the effective altruism movement that’s doing career coaching and advice for young people who are interested in maximising their contribution to the future. the nice thing about EA is that it is finally a working solution to the “tragedy of the commons” problem that the x-risk reduction is plagued with: there is no way to profit from x-risk reduction. however, once people actually start doing calculations in terms of how to maximise the impact from altruistic activities, they quite reliably end up supporting x-risk reduction. in my view, the main value that 80kh brings to the table is to potentially address the “talent bottleneck” in the x-risk research by directing more young people to it;
  • GCRI: a small org in new york, doing academic research into catastrophic risks;
  • AI Impacts: a cross between MIRI and FHI started by a longtime SIAI/MIRI/FHI researcher katja grace;
  • CHAI: stuart russell’s centre at berkeley unversity. stuart russell is the co-author of the leading AI textbook and has become the leading advocate in academia for the long-term AI safety;
  • SFF: a fund that’s helping me to do most of my philanthropy these days (in that role, SFF is a successor to BERI).

see also larks’ annual AI alignment literature reviews for more detailed overviews of the work the AI alignment ecosystem produces.