FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

“Every expert I talk to says basically the same thing: We have made no progress on interpretability, and while there is certainly a chance we will, it is only a chance. For now, we have no idea what is happening inside these prediction systems… It’s not clear that interpretability is achievable. But without it, we will be turning more and more of our society over to algorithms we do not understand. If you told me you were building a next generation nuclear power plant, but there was no way to get accurate readings on whether the reactor core was going to blow up, I’d say you shouldn’t build it.”

THE NEW YORK TIMES. OPINION. EZRA KLEIN. The Surprising Thing A.I. Engineers Will Tell You if You Let Them. April 16, 2023

By Ezra Klein Opinion Columnist

Among the many unique experiences of reporting on A.I. is this: In a young industry flooded with hype and money, person after person tells me that they are desperate to be regulated, even if it slows them down. In fact, especially if it slows them down.

What they tell me is obvious to anyone watching. Competition is forcing them to go too fast and cut too many corners. This technology is too important to be left to a race between Microsoft, Google, Meta and a few other firms. But no one company can slow down to a safe pace without risking irrelevancy. That’s where the government comes in — or so they hope.

A place to start is with the frameworks policymakers have already put forward to govern A.I. The two major proposals, at least in the West, are the “Blueprint for an A.I. Bill of Rights,” which the White House put forward in 2022, and the Artificial Intelligence Act, which the European Commission proposed in 2021. Then, last week, China released its latest regulatory approach.

Let’s start with the European proposal, as it came first. The Artificial Intelligence Act tries to regulate A.I. systems according to how they’re used. It is particularly concerned with high-risk uses, which include everything from overseeing critical infrastructure to grading papers to calculating credit scores to making hiring decisions. High-risk uses, in other words, are any use in which a person’s life or livelihood might depend on a decision made by a machine-learning algorithm.

The European Commission described this approach as “future-proof,” which proved to be predictably arrogant, as new A.I. systems have already thrown the bill’s clean definitions into chaos. Focusing on use cases is fine for narrow systems designed for a specific use, but it’s a category error when it’s applied to generalized systems. Models like GPT-4 don’t do any one thing except predict the next word in a sequence. You can use them to write code, pass the bar exam, draw up contracts, create political campaigns, plot market strategy and power A.I. companions or sexbots. In trying to regulate systems by use case, the Artificial Intelligence Act ends up saying very little about how to regulate the underlying model that’s powering all these use cases.

Unintended consequences abound. The A.I.A. mandates, for example, that in high-risk cases, “training, validation and testing data sets shall be relevant, representative, free of errors and complete.” But what the large language models are showing is that the most powerful systems are those trained on the largest data sets. Those sets can’t plausibly be free of error, and it’s not clear what it would mean for them to be representative. There’s a strong case to be made for data transparency, but I don’t think Europe intends to deploy weaker, less capable systems across everything from exam grading to infrastructure.

The other problem with the use case approach is that it treats A.I. as a technology that will, itself, respect boundaries. But its disrespect for boundaries is what most worries the people working on these systems. Imagine that “personal assistant” is rated as a low-risk use case and a hypothetical GPT-6 is deployed to power an absolutely fabulous personal assistant. The system gets tuned to be extremely good at interacting with human beings and accomplishing a diverse set of goals in the real world. That’s great until someone asks it to secure a restaurant reservation at the hottest place in town and the system decides that the only way to do it is to cause a disruption that leads a third of that night’s diners to cancel their bookings.

Sounds like sci-fi? Sorry, but this kind of problem is sci-fact. Anyone training these systems has watched them come up with solutions to problems that human beings would never consider, and for good reason. OpenAI, for instance, trained a system to play the boat racing game CoastRunners, and built in positive reinforcement for racking up a high score. It was assumed that would give the system an incentive to finish the race. But the system instead discovered “an isolated lagoon where it can turn in a large circle and repeatedly knock over three targets, timing its movement so as to always knock over the targets just as they repopulate.” Choosing this strategy meant “repeatedly catching on fire, crashing into other boats, and going the wrong way on the track,” but it also meant the highest scores, so that’s what the model did.

This is an example of “alignment risk,” the danger that what we want the systems to do and what they will actually do could diverge, and perhaps do so violently. Curbing alignment risk requires curbing the systems themselves, not just the ways we permit people to use them.

The White House’s Blueprint for an A.I. Bill of Rights is a more interesting proposal (and if you want to dig deeper into it, I interviewed its lead author, Alondra Nelson, on my podcast). But where the European Commission’s approach is much too tailored, the White House blueprint may well be too broad. No A.I. system today comes close to adhering to the framework, and it’s not clear that any of them could.

“Automated systems should provide explanations that are technically valid, meaningful and useful to you and to any operators or others who need to understand the system, and calibrated to the level of risk based on the context,” the blueprint says. Love it. But every expert I talk to says basically the same thing: We have made no progress on interpretability, and while there is certainly a chance we will, it is only a chance. For now, we have no idea what is happening inside these prediction systems. Force them to provide an explanation, and the one they give is itself a prediction of what we want to hear — it’s turtles all the way down.

The blueprint also says that “automated systems should be developed with consultation from diverse communities, stakeholders, and domain experts to identify concerns, risks and potential impacts of the system.” This is crucial, and it would be interesting to see the White House or Congress flesh out how much consultation is needed, what type is sufficient and how regulators will make sure the public’s wishes are actually followed.

It goes on to insist that “systems should undergo predeployment testing, risk identification and mitigation, and ongoing monitoring that demonstrate they are safe and effective based on their intended use.” This, too, is essential, but we do not understand these systems well enough to test and audit them effectively. OpenAI would certainly prefer that users didn’t keep jail-breaking GPT-4 to get it to ignore the company’s constraints, but the company has not been able to design a testing regime capable of coming anywhere close to that.

Perhaps the most interesting of the blueprint’s proposals is that “you should be able to opt out from automated systems in favor of a human alternative, where appropriate.” In that sentence, the devil lurks in the definition of “appropriate.” But the underlying principle is worth considering. Should there be an opt-out from A.I. systems? Which ones? When is an opt-out clause a genuine choice, and at what point does it become merely an invitation to recede from society altogether, like saying you can choose not to use the internet or vehicular transport or banking services if you so choose.

Then there are China’s proposed new rules. I won’t say much on these, except to note that they are much more restrictive than anything the United States or Europe is imagining, which makes me very skeptical of arguments that we are in a race with China to develop advanced artificial intelligence. China seems perfectly willing to cripple the development of general A.I. so it can concentrate on systems that will more reliably serve state interests.

China insists, for example, that “content generated through the use of generative A.I. shall reflect the Socialist Core Values, and may not contain: subversion of state power; overturning of the socialist system; incitement of separatism; harm to national unity; propagation of terrorism or extremism; propagation of ethnic hatred or ethnic discrimination; violent, obscene, or sexual information; false information; as well as content that may upset economic order or social order.”

If China means what it says, its A.I. sector has its work cut out for it. A.I. is advancing so quickly in the United States precisely because we’re allowing unpredictable systems to proliferate. Predictable A.I. is, for now, weaker A.I.

I wouldn’t go as far as China is going with A.I. regulation. But we need to go a lot further than we have — and fast, before these systems get too many users and companies get addicted to profits and start beating back regulators. I’m glad to see that Chuck Schumer, the Senate majority leader, is launching an initiative on A.I. regulation. And I won’t pretend to know exactly what he and his colleagues should do. But after talking to a lot of people working on these problems and reading through a lot of policy papers imagining solutions, there are a few categories I’d prioritize.

The first is the question — and it is a question — of interpretability. As I said above, it’s not clear that interpretability is achievable. But without it, we will be turning more and more of our society over to algorithms we do not understand. If you told me you were building a next generation nuclear power plant, but there was no way to get accurate readings on whether the reactor core was going to blow up, I’d say you shouldn’t build it. Is A.I. like that power plant? I’m not sure. But that’s a question society should consider, not a question that should be decided by a few hundred technologists. At the very least, I think it’s worth insisting that A.I. companies spend a good bit more time and money discovering whether this problem is solvable.

The second is security. For all the talk of an A.I. race with China, the easiest way for China — or any country for that matter, or even any hacker collective — to catch up on A.I. is to simply steal the work being done here. Any firm building A.I. systems above a certain scale should be operating with hardened cybersecurity. It’s ridiculous to block the export of advanced semiconductors to China but to simply hope that every 26-year-old engineer at OpenAI is following appropriate security measures.

The third is evaluations and audits. This is how models will be evaluated for everything from bias to the ability to scam people to the tendency to replicate themselves across the internet.

Right now, the testing done to make sure large models are safe is voluntary, opaque and inconsistent. No best practices have been accepted across the industry, and not nearly enough work has been done to build testing regimes in which the public can have confidence. That needs to change — and fast. Airplanes rarely crash because the Federal Aviation Administration is excellent at its job. The Food and Drug Administration is arguably too rigorous in its assessments of new drugs and devices, but it is very good at keeping unsafe products off the market. The government needs to do more here than just write up some standards. It needs to make investments and build institutions to conduct the monitoring.

The fourth is liability. There’s going to be a temptation to treat A.I. systems the way we treat social media platforms and exempt the companies that build them from the harms caused by those who use them. I believe that would be a mistake. The way to make A.I. systems safe is to give the companies that design the models a good reason to make them safe. Making them bear at least some liability for what their models do would encourage a lot more caution.

The fifth is, for lack of a better term, humanness. Do we want a world filled with A. I. systems that are designed to seem human in their interactions with human beings? Because make no mistake: That is a design decision, not an emergent property of machine-learning code. A.I. systems can be tuned to return dull and caveat-filled answers, or they can be built to show off sparkling personalities and become enmeshed in the emotional lives of human beings.

I think the latter class of programs has the potential to do a lot of good as well as a lot of harm, so the conditions under which they operate should be thought through carefully. It might, for instance, make sense to place fairly tight limits on the kinds of personalities that can be built for A.I. systems that interact with children. I’d also like to see very tight limits on any ability to make money by using A.I. companions to manipulate consumer behavior.

This is not meant to be an exhaustive list. Others will have different priorities and different views. And the good news is that new proposals are being released almost daily. The Future of Life Institute’s policy recommendations are strong, and I think the A.I. Objectives Institute’s focus on the human-run institutions that will design and own A.I. systems is critical. But one thing regulators shouldn’t fear is imperfect rules that slow a young industry. For once, much of that industry is desperate for someone to help slow it down.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.