Proof-carrying code [for Safe AI] was originally described in 1996 by George Necula and Peter Lee.

Proof-carrying code

Proof-carrying code (PCC) is a software mechanism that allows a host system to verify properties about an application via a formal proof that accompanies the application’s executable code. The host system can quickly verify the validity of the proof, and it can compare the conclusions of the proof to its own security policy to determine whether the application is safe to execute. This can be particularly useful in ensuring memory safety (i.e. preventing issues like buffer overflows).

Proof-carrying code was originally described in 1996 by George Necula and Peter Lee.

Packet filter example

The original publication on proof-carrying code in 1996^[1] used packet filters as an example: a user-mode application hands a function written in machine code to the kernel that determines whether or not an application is interested in processing a particular network packet. Because the packet filter runs in kernel mode, it could compromise the integrity of the system if it contains malicious code that writes to kernel data structures. Traditional approaches to this problem include interpreting a domain-specific language for packet filtering, inserting checks on each memory access (software fault isolation), and writing the filter in a high-level language which is compiled by the kernel before it is run. These approaches have performance disadvantages for code as frequently run as a packet filter, except for the in-kernel compilation approach, which only compiles the code when it is loaded, not every time it is executed.

With proof-carrying code, the kernel publishes a security policy specifying properties that any packet filter must obey: for example, the packet filter will not access memory outside of the packet and its scratch memory area. A theorem prover is used to show that the machine code satisfies this policy. The steps of this proof are recorded and attached to the machine code which is given to the kernel program loader. The program loader can then rapidly validate the proof, allowing it to thereafter run the machine code without any additional checks. If a malicious party modifies either the machine code or the proof, the resulting proof-carrying code is either invalid or harmless (still satisfies the security policy).

Learn More:

Proof-carrying Code, Necula
Abstract This paper describes proof-carrying code (PCC), a mechanism by which a host system can determine with certainty that it is safe to execute a program supplied (possibly in binary form) by an untrusted source. For this to be possible, the untrusted code producer must supply with the code a safety proof that attests to the code’s adherence to a previously defined safety policy. The host can then easily and quickly validate the proof without using cryptography and without consulting any external agents.In order to gain preliminary experience with PCC, we have performed several case studies. We show in this paper how proof-carrying code might be used to develop safe assembly-language extensions of ML programs. In the context of this case study, we present and prove the adequacy of concrete representations for the safety policy, the safety proofs, and the proof validation. Finally, we briefly discuss how we use proof-carrying code to develop network packet filters that are faster than similar filters developed using other techniques and are formally guaranteed to be safe with respect to a given operating system safety policy.

Stuart Russell
“So, we build AI systems that know that they don’t know what the objective is, right? The objective is whatever humans want the future to be like. That’s the objective. But the AI system knows that it doesn’t know what that is, right? And this is the different kind of AI system. Initially, it sounds sort of, well, how could that even work, right? How can it pursue an objective if it doesn’t know what the objective is? But we do this all the time, right? You go into a restaurant. The chef wants you to have a good meal, but the chef doesn’t know what you like. So, what do they do? They have a thing called a menu, right? The menu is how you communicate your preferences to the chef. And that way, hopefully, what the chef does is aligned with the culinary future that you are actually interested in, right? So, that’s it, you know. Another interesting example is when you have to buy a birthday present for your loved one. Right? And so, here, this is exactly analogous, right? The only thing you care about is how happy your loved one is with the present, because you don’t know how happy they’re going to be with any particular present you might buy them. This is exactly the situation that the AI system is in. So, what do you do? You have one of your kids find out, right? Or you, you know, you leave pictures of things around the house and see which ones, “Oh, that looks like a nice, you know, sailboat,” or “Really, you know, I really like that watch,” or whatever it might be, right? You know, so you try to get clues. You, you know, you review your past history of failure and you try to get there, right? And, you know, and sometimes you even ask directly, right? So, it’s absolutely entirely possible to define this problem mathematically. And I don’t want to go into too much nerdy stuff, but game theory is the branch of economics that deals with decision problems that involve two or more entities. And so, here, there are at least two entities. There’s at least one machine and at least one human. And so, in game theory, you can literally just formulate this approach mathematically. We call it an assistance game because the robot is supposed to be of assistance to the human. And you can show that, if the machine solves these assistance games, then the human in the game is guaranteed to be happy with the outcome, right? And during the game, information is flowing from the human to the machine about what the human wants. So, the human could make requests, for example. And that’s evidence for the machine. It’s not… It doesn’t become gospel truth. It’s just evidence. It’s a piece of evidence, right?”
“So, what you do is you require that the chips themselves are the police, right? That the chips say, “I am not going to run a software object that doesn’t come with an appropriate authority.” Right? And this can be done in two ways. What we do now is we have licenses. Right? And you know, when you download software off the web, your laptop, you know, the browser is checking the authority of the license. You know, is it up to date? Is it from a valid issuer? And it won’t let you run stuff that’s not authorized. But you can actually do something much better than that, which is to require a proof of safety. And the hardware can check that proof. And this is a technology called proof-carrying code that was developed by George Necula, who was one of my colleagues at Berkeley. And so. the hardware can check those proofs. So, you don’t need an authority. It doesn’t have to be that the government gives you a license to run software. It doesn’t matter. The software has to come with a proof. And if the software isn’t safe, the proof won’t work and so the hardware won’t run it. And that approach is, I think, a feasible approach to doing it. But as you can imagine, getting all of that in place, right, sort of replacing the whole stack is going to be a huge undertaking, but we have to do it.”

Provably Safe Systems: The Only Path to Controllable AGI. Tegmark and Omohundro. 06 SEPT 2023.
Abstract. AGI [91] safety is of the utmost urgency, since corporations and research labs are racing to build AGI despite prominent AI researchers and business leaders warning that it may lead to human extinction [11]. While governments are drafting AI regulations, there’s little indication that they will be sufficient to resist competitive pressures and prevent the creation of AGI. Median estimates on the forecasting platform Metaculus of the date of AGI’s creation have plummeted over the past few years from many decades away to 2027 [25] or 2032 [24] depending on definitions, with superintelligence expected to follow a few years later [23]. Is Alan Turing correct that we now “have to expect the machines to take control”? If AI safety research remains at current paltry levels, this seems likely. Considering the stakes, the AI safety effort is absurdly small in terms of both funding and the number of people. One analysis [73] estimates that less than $150 million will be spent on AI Safety research this year, while, for example, $63 billion will be spent on cosmetic surgery [14] and $1 trillion on cigarettes [13]. Another analyst estimates [10] that only about one in a thousand AI researchers works on safety. Much of the current AI safety work is focused on “alignment” which attempts to fine-tune deep neural networks so that their behavior becomes more aligned with human preferences. While this is valuable, we believe it is inadequate for human safety, especially given the profusion of open-source AI that can be used maliciously. In the face of the possibility of human extinction, we must adopt a “security mindset” [30] and rapidly work to create designs which will be safe also against adversarial AGIs. With a security mindset, we must design safety both into AGIs and also into the physical, digital, and social infrastructure that they interact with [5]. AGI computations are only dangerous for us when they lead to harmful actions in the world. In current approaches, many researchers seem resigned to having to treat AGIs as inscrutable black boxes. Indeed, some younger researchers seem to subconsciously equate AI with neural networks, or even with transformers. This manifesto argues that such resignation is too pessimistic. Mechanistic interpretability is opening up the black box and, once opened, it can be distilled into formal representations that can be reasoned about. We argue that mathematical proof is humanity’s most powerful tool for controlling AGIs. Regardless of how intelligent a system becomes, it cannot prove a mathematical falsehood or do what is provably impossible. And mathematical proofs are cheap to check with inexpensive, extremely reliable hardware. The behavior of physical, digital, and social systems can be precisely modeled as formal systems and precise “guardrails” can be defined that constrain what actions can occur. Even inscrutable AI systems can be required to provide safety proofs for their recommended actions. These proofs can be precisely validated independent of the alignment status of the AGIs which generated them. There is a critical distinction between an AGI world which turns out to be safe and an AGI world in which humanity has extremely high confidence of safety. If a mathematical proof of safety doesn’t exist, then it is very likely that malicious AGIs will find the vulnerabilities and exploit them [37]. If there is a proof of safety and it is made explicit and machine- checkable, then humanity can trust protected machines to design, implement, and execute safe actions. The design and creation of provably safe AGI and infrastructure is both extremely valuable and can become increasingly routine to implement, thanks to AI-powered advances in automated theorem proving and mechanistic interpretability.

Proof-carrying code

Packet filter example

Share This Story, Choose Your Platform!