If your AI model is going to sell, it has to be safe
OpenAI’s GPT-4 shows the competitive advantage of putting in safety work.
By Haydn Belfield Mar 25, 2023, 7:30am EDT
FOR EDUCATIONAL PURPOSES
On March 14, OpenAI released the successor to ChatGPT: GPT-4. It impressed observers with its markedly improved performance across reasoning, retention, and coding. It also fanned fears around AI safety, around our ability to control these increasingly powerful models. But that debate obscures the fact that, in many ways, GPT-4’s most remarkable gains, compared to similar models in the past, have been around safety.
According to the company’s Technical Report, during GPT-4’s development, OpenAI “spent six months on safety research, risk assessment, and iteration.” OpenAI reported that this work yielded significant results: “GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.” (ChatGPT is a slightly tweaked version of GPT-3.5: if you’ve been using ChatGPT over the last few months, you’ve been interacting with GPT-3.5.)
This demonstrates a broader point: For AI companies, there are significant competitive advantages and profit incentives for emphasizing safety. The key success of ChatGPT over other companies’ large language models (LLMs) — apart from a nice user interface and remarkable word-of-mouth buzz — is precisely its safety. Even as it rapidly grew to over 100 million users, it hasn’t had to be taken down or significantly tweaked to make it less harmful (and less useful).
Tech companies should be investing heavily in safety research and testing for all our sakes, but also for their own commercial self-interest. That way, the AI model works as intended, and these companies can keep their tech online. ChatGPT Plus is making money, and you can’t make money if you’ve had to take your language model down. OpenAI’s reputation has been increased by its tech being safer than its competitors, while other tech companies have had their reputations hit by their tech being unsafe, and even having to take it down. (Disclosure: I am listed in the acknowledgments of the GPT-4 System Card, but I have not shown the draft of this story to anyone at OpenAI, nor have I taken funding from the company.)
The competitive advantage of AI safety
Just ask Mark Zuckerberg. When Meta released its large language model BlenderBot 3 in August 2022, it immediately faced problems of making inappropriate and untrue statements. Meta’s Galactica was only up for three days in November 2022 before it was withdrawn after it was shown confidently ‘hallucinating’ (making up) academic papers that didn’t exist. Most recently, in February 2023, Meta irresponsibly released the full weights of its latest language model, LLaMA. As many experts predicted would happen, it proliferated to 4chan, where it will be used to mass-produce disinformation and hate.
I and my co-authors warned about this five years ago in a 2018 report called “The Malicious Use of Artificial Intelligence,” while the Partnership on AI (Meta was a founding member and remains an active partner) had a great report on responsible publication in 2021. These repeated and failed attempts to “move fast and break things” have probably exacerbated Meta’s trust problems. In surveys from 2021 of AI researchers and the US public on trust in actors to shape the development and use of AI in the public interest, “Facebook [Meta] is ranked the least trustworthy of American tech companies.”
But it’s not just Meta. The original misbehaving machine learning chatbot was Microsoft’s Tay, which was withdrawn 16 hours after it was released in 2016 after making racist and inflammatory statements. Even Bing/Sydney had some very erratic responses, including declaring its love for, and then threatening, a journalist. In response, Microsoft limited the number of messages one could exchange, and Bing/Sydney no longer answers questions about itself.
We now know Microsoft based it on OpenAI’s GPT-4; Microsoft invested $11 billion into OpenAI in return for OpenAI running all their computing on Microsoft’s Azure cloud and becoming their “preferred partner for commercializing new AI technologies.” But it is unclear why the model responded so strangely. It could have been an early, not fully safety-trained version, or it could be due to its connection to search and thus its ability to “read” and respond to an article about itself in real time. (By contrast, GPT-4’s training data only runs up to September 2021, and it does not have access to the web.) It’s notable that even as it was heralding its new AI models, Microsoft recently laid off its AI ethics and society team.
OpenAI took a different path with GPT-4, but it’s not the only AI company that has been putting in the work on safety. Other leading labs have also been making clear their commitments, with Anthropic and DeepMind publishing their safety and alignment strategies. These two labs have also been safe and cautious with the development and deployment of Claude and Sparrow, their respective LLMs.
A playbook for best practices
Tech companies developing LLMs and other forms of cutting-edge, impactful AI should learn from this comparison. They should adopt the best practice as shown by OpenAI: Invest in safety research and testing before releasing.
What does this look like specifically? GPT-4’s System Card describes four steps OpenAI took that could be a model for other companies.
First, prune your dataset for toxic or inappropriate content. Second, train your system with reinforcement learning from human feedback (RLHF) and rule-based reward models (RBRMs). RLHF involves human labelers creating demonstration data for the model to copy and ranking data (“output A is preferred to output B”) for the model to better predict what outputs we want. RLHF produces a model that is sometimes overcautious, refusing to answer or hedging (as some users of ChatGPT will have noticed).
RBRM is an automated classifier that evaluates the model’s output on a set of rules in multiple-choice style, then rewards the model for refusing or answering for the right reasons and in the desired style. So the combination of RLHF and RBRM encourages the model to answer questions helpfully, refuse to answer some harmful questions, and distinguish between the two.
Third, provide structured access to the model through an API. This allows you to filter responses and monitor for poor behavior from the model (or from users). Fourth, invest in moderation, both by humans and by automated moderation and content classifiers. For example, OpenAI used GPT-4 to create rule-based classifiers that flag model outputs that could be harmful.
This all takes time and effort, but it’s worth it. Other approaches can also work, like Anthropic’s rule-following Constitutional AI, which leverages RL from AI feedback (RLAIF) to complement human labelers. As OpenAI acknowledges, their approach is not perfect: the model still hallucinates and can still sometimes be tricked into providing harmful content. Indeed, there’s room to go beyond and improve upon OpenAI’s approach, for example by providing more compensation and career progression opportunities for the human labelers of outputs.
Has OpenAI become less open? If this means less open source, then no. OpenAI adopted a “staged release” strategy for GPT-2 in 2019 and an API in 2020. Given Meta’s 4chan experience, this seems justified. As Ilya Sutskever, OpenAI chief scientist, noted to The Verge: “I fully expect that in a few years it’s going to be completely obvious to everyone that open-sourcing AI is just not wise.”
GPT-4 did have less information than previous releases on “architecture (including model size), hardware, training compute, dataset construction, training method.” This is because OpenAI is concerned about acceleration risk: “the risk of racing dynamics leading to a decline in safety standards, the diffusion of bad norms, and accelerated AI timelines, each of which heighten societal risks associated with AI.”
Providing those technical details would speed up the overall rate of progress in developing and deploying powerful AI systems. However, AI poses many unsolved governance and technical challenges: For example, the US and EU won’t have detailed safety technical standards for high-risk AI systems ready until early 2025.
That’s why I and others believe we shouldn’t be speeding up progress in AI capabilities, but we should be going full speed ahead on safety progress. Any reduced openness should never be an impediment to safety, which is why it’s so useful that the System Card shares details on safety challenges and mitigation techniques. Even though OpenAI seems to be coming around to this view, they’re still at the forefront of pushing forward capabilities, and should provide more information on how and when they envisage themselves and the field slowing down.
AI companies should be investing significantly in safety research and testing. It is the right thing to do and will soon be required by regulation and safety standards in the EU and USA. But also, it is in the self-interest of these AI companies. Put in the work, get the reward.
Haydn Belfield has been academic project manager at the University of Cambridge’s Centre for the Study of Existential Risk (CSER) for the past six years. He is also an associate fellow at the Leverhulme Centre for the Future of Intelligence.
LEARN MORE
As the company accelerates its push into AI products, the ethics and society team is gone
Microsoft laid off its entire ethics and society team within the artificial intelligence organization as part of recent layoffs that affected 10,000 employees across the company, Platformer has learned.
The move leaves Microsoft without a dedicated team to ensure its AI principles are closely tied to product design at a time when the company is leading the charge to make AI tools available to the mainstream, current and former employees said.
Microsoft still maintains an active Office of Responsible AI, which is tasked with creating rules and principles to govern the company’s AI initiatives. The company says its overall investment in responsibility work is increasing despite the recent layoffs.
“Microsoft is committed to developing AI products and experiences safely and responsibly, and does so by investing in people, processes, and partnerships that prioritize this,” the company said in a statement. “Over the past six years we have increased the number of people across our product teams and within the Office of Responsible AI who, along with all of us at Microsoft, are accountable for ensuring we put our AI principles into practice. […] We appreciate the trailblazing work the Ethics & Society did to help us on our ongoing responsible AI journey.”
But employees said the ethics and society team played a critical role in ensuring that the company’s responsible AI principles are actually reflected in the design of the products that ship.
“Our job was to … create rules in areas where there were none.”
“People would look at the principles coming out of the office of responsible AI and say, ‘I don’t know how this applies,’” one former employee says. “Our job was to show them and to create rules in areas where there were none.”
In recent years, the team designed a role-playing game called Judgment Call that helped designers envision potential harms that could result from AI and discuss them during product development. It was part of a larger “responsible innovation toolkit” that the team posted publicly.
More recently, the team has been working to identify risks posed by Microsoft’s adoption of OpenAI’s technology throughout its suite of products.
The ethics and society team was at its largest in 2020, when it had roughly 30 employees including engineers, designers, and philosophers. In October, the team was cut to roughly seven people as part of a reorganization.
In a meeting with the team following the reorg, John Montgomery, corporate vice president of AI, told employees that company leaders had instructed them to move swiftly. “The pressure from [CTO] Kevin [Scott] and [CEO] Satya [Nadella] is very, very high to take these most recent OpenAI models and the ones that come after them and move them into customers hands at a very high speed,” he said, according to audio of the meeting obtained by Platformer.
Because of that pressure, Montgomery said, much of the team was going to be moved to other areas of the organization.
Some members of the team pushed back. “I’m going to be bold enough to ask you to please reconsider this decision,” one employee said on the call. “While I understand there are business issues at play … what this team has always been deeply concerned about is how we impact society and the negative impacts that we’ve had. And they are significant.”
Montgomery declined. “Can I reconsider? I don’t think I will,” he said. “Cause unfortunately the pressures remain the same. You don’t have the view that I have, and probably you can be thankful for that. There’s a lot of stuff being ground up into the sausage.”
In response to questions, though, Montgomery said the team would not be eliminated.
“It’s not that it’s going away — it’s that it’s evolving,” he said. “It’s evolving toward putting more of the energy within the individual product teams that are building the services and the software, which does mean that the central hub that has been doing some of the work is devolving its abilities and responsibilities.”
Most members of the team were transferred elsewhere within Microsoft. Afterward, remaining ethics and society team members said that the smaller crew made it difficult to implement their ambitious plans.
The move leaves a foundational gap on the holistic design of AI products, one employee says
About five months later, on March 6th, remaining employees were told to join a Zoom call at 11:30AM PT to hear a “business critical update” from Montgomery. During the meeting, they were told that their team was being eliminated after all.
One employee says the move leaves a foundational gap on the user experience and holistic design of AI products. “The worst thing is we’ve exposed the business to risk and human beings to risk in doing this,” they explained.
The conflict underscores an ongoing tension for tech giants that build divisions dedicated to making their products more socially responsible. At their best, they help product teams anticipate potential misuses of technology and fix any problems before they ship.
But they also have the job of saying “no” or “slow down” inside organizations that often don’t want to hear it — or spelling out risks that could lead to legal headaches for the company if surfaced in legal discovery. And the resulting friction sometimes boils over into public view.
In 2020, Google fired ethical AI researcher Timnit Gebru after she published a paper critical of the large language models that would explode into popularity two years later. The resulting furor resulted in the departures of several more top leaders within the department, and diminished the company’s credibility on responsible AI issues.
Microsoft became focused on shipping AI tools more quickly than its rivals
Members of the ethics and society team said they generally tried to be supportive of product development. But they said that as Microsoft became focused on shipping AI tools more quickly than its rivals, the company’s leadership became less interested in the kind of long-term thinking that the team specialized in.
It’s a dynamic that bears close scrutiny. On one hand, Microsoft may now have a once-in-a-generation chance to gain significant traction against Google in search, productivity software, cloud computing, and other areas where the giants compete. When it relaunched Bing with AI, the company told investors that every 1 percent of market share it could take away from Google in search would result in $2 billion in annual revenue.
That potential explains why Microsoft has so far invested $11 billion into OpenAI, and is currently racing to integrate the startup’s technology into every corner of its empire. It appears to be having some early success: the company said last week Bing now has 100 million daily active users, with one third of them new since the search engine relaunched with OpenAI’s technology.
On the other hand, everyone involved in the development of AI agrees that the technology poses potent and possibly existential risks, both known and unknown. Tech giants have taken pains to signal that they are taking those risks seriously — Microsoft alone has three different groups working on the issue, even after the elimination of the ethics and society team. But given the stakes, any cuts to teams focused on responsible work seem noteworthy.
The elimination of the ethics and society team came just as the group’s remaining employees had trained their focus on arguably their biggest challenge yet: anticipating what would happen when Microsoft released tools powered by OpenAI to a global audience.
Last year, the team wrote a memo detailing brand risks associated with the Bing Image Creator, which uses OpenAI’s DALL-E system to create images based on text prompts. The image tool launched in a handful of countries in October, making it one of Microsoft’s first public collaborations with OpenAI.
While text-to-image technology has proved hugely popular, Microsoft researchers correctly predicted that it it could also threaten artists’ livelihoods by allowing anyone to easily copy their style.
“In testing Bing Image Creator, it was discovered that with a simple prompt including just the artist’s name and a medium (painting, print, photography, or sculpture), generated images were almost impossible to differentiate from the original works,” researchers wrote in the memo.
“The risk of brand damage … is real and significant enough to require redress.”
They added: “The risk of brand damage, both to the artist and their financial stakeholders, and the negative PR to Microsoft resulting from artists’ complaints and negative public reaction is real and significant enough to require redress before it damages Microsoft’s brand.”
In addition, last year OpenAI updated its terms of service to give users “full ownership rights to the images you create with DALL-E.” The move left Microsoft’s ethics and society team worried.
“If an AI-image generator mathematically replicates images of works, it is ethically suspect to suggest that the person who submitted the prompt has full ownership rights of the resulting image,” they wrote in the memo.
Microsoft researchers created a list of mitigation strategies, including blocking Bing Image Creator users from using the names of living artists as prompts and creating a marketplace to sell an artist’s work that would be surfaced if someone searched for their name.
Employees say neither of these strategies were implemented, and Bing Image Creator launched into test countries anyway.
Microsoft says the tool was modified before launch to address concerns raised in the document, and prompted additional work from its responsible AI team.
But legal questions about the technology remain unresolved. In February 2023, Getty Images filed a lawsuit against Stability AI, makers of the AI art generator Stable Diffusion. Getty accused the AI startup of improperly using more than 12 million images to train its system.
The accusations echoed concerns raised by Microsoft’s own AI ethicists. “It is likely that few artists have consented to allow their works to be used as training data, and likely that many are still unaware how generative tech allows variations of online images of their work to be produced in seconds,” employees wrote last year.