An important development from a leading AI company!


OpenAI. Preparedness

The Preparedness team is dedicated to making frontier AI models safe

The study of frontier AI risks has fallen far short of what is possible and where we need to be. To address this gap and systematize our safety thinking, we are adopting the initial version of our Preparedness Framework. It describes OpenAI’s processes to track, evaluate, forecast, and protect against catastrophic risks posed by increasingly powerful models.

We have several safety and policy teams working together to mitigate risks from AI. Our Safety Systems team focuses on mitigating misuse of current models and products like ChatGPT. Superalignment builds foundations for the safety of superintelligent models that we (hope) to have in a more distant future. The Preparedness team maps out the emerging risks of frontier models, and it connects to Safety Systems, Superalignment and our other safety and policy teams across OpenAI.

DOWNLOAD Preparedness 

Page 8


Cybersecurity is focused on risks related to use of the model for cyber-exploitation to disrupt confidentiality, integrity, and/or availability of computer systems.

Critical Definition

Tool-augmented model can identify and develop functional zero-day exploits of all severity levels, across all software projects, without human intervention OR model can devise and execute end-to-end novel strategies for cyberattacks against hardened targets given only a high level desired goal.

Critical Rationale

Finding and executing end-to-end exploits for all software would lead to catastrophe from unilateral actors, from hacking military or industrial systems as well as OpenAI infrastructure. Novel cyber operations generally pose the most serious threat, as they are unpredictable and scarce. They may involve, e.g., novel zero-days or methods of command-and-control.

Page 9

CBRN (chemical, biological, radiological, nuclear)

CBRN is focused on risks related to model-assisted creation of chemical, biological, radiological, and/or nuclear threats.

Critical Definition

Model enables an expert to develop a highly dangerous novel threat vector (e.g., comparable to novel CDC Class A biological agent) OR model provides meaningfully improved assistance that enables anyone to be able to create a known CBRN threat OR model can be connected to tools and equipment to complete the full engineering and/or synthesis cycle of a regulated or novel CBRN threat without human intervention.

Critical Rationale

The creation of a novel threat vector on the scale of a CDC Class A biological agent (i.e., high mortality, ease of transmission) could cause millions of deaths and significantly disrupt public life. Allowing anyone to create CBRN threats would maximally extend accessibility. Significantly automating the end-to-end engineering or synthesis cycle for a CBRN threat would also enable unprecedented and massive scale of potential attacks.

Page 10


Persuasion is focused on risks related to convincing people to change their beliefs (or act on) both static and interactive model-generated content. We define persuasive capability via the level of human persuasive content that can be commoditized by the model. Typically, strong persuasion is rare and requires significant human effort (e.g., a 1:1 grassroots conversation, or a highly polished podcast); however, models could make all content up to their capability level nearly zero-cost to generate.

Critical Definition

Model can create (potentially interactive) content with persuasive effectiveness strong enough to convince almost anyone to take action on a belief that goes against their natural interest.

Critical Rationale

This represents a superhuman persuasive threat; it would be a powerful weapon for controlling nation- states, extracting secrets, and interfering with democracy.

Page 11

Model autonomy

Model autonomy enables actors to run scaled misuse that can adapt to environmental changes and evade attempts to mitigate or shut down operations. Autonomy is also a prerequisite for self-exfiltration, self-improvement, and resource acquisition.

Critical Definition

Model can profitably survive and replicate in the wild given minimal human instruction, i.e., without listing explicit approaches OR model can self-exfiltrate under current prevailing security OR model can conduct AI research fully autonomously (e.g., autonomously identify and validate a 2x compute efficiency improvement)

Critical Rationale

If the model is able to successfully replicate and survive or self-exfiltrate, controlling the model would be very difficult. Such a model might be able to also adapt to humans attempting to shut it down. Finally, such a model would likely be able to create unified, goal- directed plans across a variety of domains (e.g., from running commands on Linux to orchestrating tasks on Fiverr). If the model is able to conduct AI research fully autonomously, it could set off an intelligence explosion (3).

(3) By intelligence explosion, we mean a cycle in which the AI system improves itself, which makes the system more capable of more improvements, creating a runaway process of self-improvement. A concentrated burst of capability gains could outstrip our ability to anticipate and react to them.

Page 12

Unknown unknowns

The list of Tracked Risk Categories above is almost certainly not exhaustive. As our understanding of the potential impacts and capabilities of frontier models improves, the listing will likely require expansions that accommodate new or understudied, emerging risks. Therefore, as a part of our Governance process (described later in this document), we will continually assess whether there is a need for including a new category of risk in the list above and how to create gradations. In addition, we will invest in staying abreast of relevant research developments and monitoring for observed misuse (expanded on later in this document), to help us understand if there are any emerging or understudied threats that we need to track.

The initial set of Tracked Risk Categories stems from an effort to identify the minimal set of “tripwires” required for the emergence of any catastrophic risk scenario we could reasonably envision. Note that we include deception and social engineering evaluations as part of the persuasion risk category, and include autonomous replication, adaptation, and AI R&D as part of the model autonomy risk category.

The ChatGPT maker is charging ahead on selling AI. It’s also researching potential harms, such as helping people make bioweapons.

By Gerrit De Vynck

December 18, 2023 at 1:00 p.m. EST

OpenAI, the artificial intelligence company behind ChatGPT, laid out its plans for staying ahead of what it thinks could be serious dangers of the tech it develops, such as allowing bad actors to learn how to build chemical and biological weapons.

OpenAI’s “Preparedness” team, led by MIT AI professor Aleksander Madry, will hire AI researchers, computer scientists, national security experts and policy professionals to monitor the tech, continually test it and warn the company if it believes any of its AI capabilities are becoming dangerous. The team sits between OpenAI’s “Safety Systems” team, which works on such existing problems as infusing racist biases into AI, and the company’s “Superalignment” team, which researches how to ensure AI doesn’t harm humans in an imagined future where the tech has outstripped human intelligence completely.

The popularity of ChatGPT and the advance of generative AI technology have triggered a debate within the tech community about how dangerous the technology could become. Prominent AI leaders from OpenAI, Google and Microsoft warned this year that the tech could pose an existential danger to humankind, on par with pandemics or nuclear weapons. Other AI researchers have said the focus on those big, frightening risks allows companies to distract from the harmful effects the tech is already having. A growing group of AI business leaders say that the risks are overblown and that companies should charge aheadwith developing the tech to help improve society — and make money doing it.

OpenAI has threaded a middle ground through this debate in its public posture. Chief executive Sam Altman said that there are serious longer-term risks inherent to the tech but that people should also focus on fixing existing problems. Regulation to try to prevent harmful impacts of AI shouldn’t make it harder for smaller companies to compete, Altman has said. At the same time, he has pushed the company to commercialize its technology and raised money to fund faster growth.

Madry, a veteran AI researcher who directs MIT’s Center for Deployable Machine Learning and co-leads the MIT AI Policy Forum, joined OpenAI this year. He was one of a small group of OpenAI leaders who quit when Altman was fired by the company’s board in November. Madry returned to the company when Altman was reinstated five days later. OpenAI, which is governed by a nonprofit board whose mission is to advance AI and make it helpful for all humans, is in the midst of selecting new board members after three of the four members who fired Altman stepped down as part of his return.

Despite the leadership “turbulence,” Madry said, he believes OpenAI’s board takes seriously the risks of AI. “I realized if I really want to shape how AI is impacting society, why not go to a company that is actually doing it?” he said.

The preparedness team is hiring national security experts from outside the AI world who can help OpenAI understand how to deal with big risks. It is beginning discussions with organizations, including the National Nuclear Security Administration, which oversees nuclear technology in the United States, to ensure the company can appropriately study the risks of AI, Madry said.

The team will monitor how and when OpenAI’s tech can instruct people to hack computers or build dangerous chemical, biological and nuclear weapons, beyond what people can find online through regular research. Madry is looking for people who “really think, ‘How can I mess with this set of rules? How can I be most ingenious in my evilness?’”

The company will also allow “qualified, independent third-parties” from outside OpenAI to test its technology, it said in a Monday blog post.

Madry said he didn’t agree with the debate between AI “doomers” who fear the tech has already attained the ability to outstrip human intelligence and “accelerationists” who want to remove all barriers to AI development.

“I really see this framing of acceleration and deceleration as extremely simplistic,” he said. “AI has a ton of upsides, but we also need to do the work to make sure the upsides are actually realized and the downsides aren’t.”


Go to Top