Research Note: Sealed Computation
How can we prove that an AI output was produced at a particular time and place, using a particular GPU, all without having knowledge of the actual workload?
- Verification
- Inference
- Accelerator
Inference Certificates
As a prerequisite for the virtuality.network, we need to enable organizations which host inference workloads to prove the following about a particular AI output:
- It was generated during this time period.
- It was generated in this geographical region.
- It was generated using this unique chip.
- The generation workload was appropriately monitored.
In addition, to make it as easy as possible for Hosts to accede to the structured consortium, the techniques which facilitate these guarantees need to satisfy the following properties:
- Hosts do not need to publish their inference logic.
- Workloads can connect to the internet by default.
- Producing inference certificates incurs negligible overhead.
- Workloads require an additional interface if and only if Hosts want them to produce inference certificates. As a corollary, unmodified inference workloads can run normally if they do not need to produce such proofs.
Proving that a particular output was generated in specific circumstances while only incurring a negligible Host burden is non-trivial, but we believe that it can be done. In the rest of this post, we lay out a protocol for achieving such proofs.
Generalized Statement
Let func
be a computable function which takes in an input
object and produces an output
object after some sequence of operations. In the context of AI inference, this function might take the form of an inference workload which generates a completion based on a given prompt.
Now, func
can actually be computed on a range of computational substrates. Perhaps the algorithm is run on an Intel CPU, perhaps it is run on an Nvidia GPU. What's important here is that the function gets evaluated for a given input, regardless of how the computation is carried out.
The challenge is to produce some document which proves that a given output of func
was produced within some bounds of space and time, and to do this without having knowledge of the actual logic implemented by func
. Knowing that some computational substrate was present within those tight bounds, such as a unique chip, can then be used to attribute the computation to that substrate. Finally, proofs can rest on premises which involve trust in other parties, though they are strongest when as few assumptions as possible are made.
System Architecture
We now attach labels to the parties which would be involved in carrying out a computation in such regime:
user
is the party which provides an input, wants to obtain an output based on it, and wants to gain confidence in the fact that the computation which turned their input into the output was carried out in specific, controlled circumstances.workload
is the party which is responsible for actually carrying out that computation. In other words, theworkload
is capable of processing requests from theuser
in order to respond to them with desired outputs. Theuser
andworkload
communicate with each other across a secure channel backed by public key infrastructure. In practice, theworkload
might be a Docker container.embassy
is the final party, and it has a unique ability: to seal theworkload
from the outside world when theworkload
itself asks it to do so, and to similarly unseal it on request. Theuser
trusts theembassy
, but doesn't trust the opaqueworkload
. In practice, theembassy
and theworkload
share an enclave, yet theembassy
would match the privileges of the guest OS, while theworkload
would be containerized.
Proving Locality
The user
wants to have the workload
process their input. When they do not want a proof of locality, they simply encrypt their inputs using the public key of the workload
and send it a request. The workload
processes the request as usual and sends the user
a response on completion. However, in case the user
wants not only to receive an output
but also wants to receive guarantees about the fact that it was obtained in controlled circumstances, they do the following:
Step 1: Requesting
Instead of only encrypting their input with the workload
key, the user
jointly encrypts it using both the workload
key and a "sealing session" key published by the embassy
ahead of time. Then, the user
sends the workload
their request over a secure channel, as before.
Step 2: Sealing
The workload
notices that the user
has requested the sealed processing of inputs which it can't access without help from the embassy
. To fulfill the request, it asks the embassy the following: "Please seal me and provide me with the missing key so I can reveal the inputs and start processing them." The embassy
complies with the request, cutting the workload
off from the outside world while only preserving a channel between them. In addition to the missing key, the embassy
also provides the workload
with a signed statement: "I sealed the workload at this timestamp and location. I'm also reporting these system measurements. Yours sincerely, the embassy."
With everything it needs in order to unlock the input now at its disposal, the workload
decrypts it and starts computing the output. It can't cheat by asking other computers to do the work for it, because the embassy
has followed through with its isolation. It also couldn't have cheated before sealing because it couldn't access the actual inputs. After working through the algorithm and reaching the output, it tells the embassy: "I'm done with processing this input. Also, here's the hash of the output I got. Please unseal me."
Step 3: Unsealing
The embassy
complies with the request, reactivating communications between the workload
and the world. In addition, the embassy
provides the workload
with a signed statement: "The workload commited to an output with this hash. I then unsealed it at this timestamp and location. Here are also some system measurements. Yours sincerely, the embassy." Because the workload
has already commited to an output when requesting unsealing, it can't cheat by asking other computers for help afterwards.
Step 4: Responding
Now, the workload
has all the pieces it needs in order to send the user
the awaited response. The workload
adds in the output, as well as the two statements signed by the embassy
. When the user
finally receives the response, they can verify that the output matches the commitment endorsed by the embassy
. They can also verify that the system measurements match the golden ones, and verify the authenticity of the signatures.
As a final detail, the initial user
request must contain a nonce, which would also be included in the statements signed by the embassy
. This helps mitigate replay attacks by tying the certificate to a unique random number included in the original request. The user
would also be able to verify that the signed statements incorporate the original nonce.
Seal Directionality
To recap, based on trust in the ability of the embassy
to temporarily seal the unknown workload
on request, the user
gains confidence in the fact that the response they received was produced in a controlled setting. It is possible to work with a seal which attempts to let nothing in, nothing out, and even nothing persisted locally for the period when the seal is active. However, it is worth exploring in more depth the fundamental properties of the seal. It can be described as having three core settings:
- Inward sealing. Can information get inside the seal? In practice, one might implement this by temporarily blocking all inbound connections for a container.
- Weak outward sealing. Can information get outside the seal when it's active? In practice, one might do this this by temporarily blocking all outbound connections.
- Strong outward sealing. Can information accessed inside the seal ever get outside, even after unsealing? In practice, one might implement this by blocking outbound flow as before, but by also resetting container memory to a snapshot captured just before the seal was activated.
Inward and outward sealing are independent, and each appears to have interesting applications. Inward sealing is essential for proofs of locality, because we want to guarantee that no computation carried out beyond the trust boundary has fed into the workload
output in any way. Outward sealing, especially the strong variant, might be useful for gaining guarantees about the fact that certain inputs or outputs will not be left to an untrusted workload capable of sending them elsewhere. In the context of AI governance, inward sealing would be useful for binding inference to particular chip identifiers, while outward sealing would be useful for having models engage with sensitive knowledge in order to study their dual-use capabilities.
Binding Claims
In its purest form, sealed computation only helps the user
convert trust in the embassy
into confidence about the fact that the workload
has been carried out by an isolated process running in the same trust boundary as the embassy
. However, there is a gap between this and the neat claims listed at the beginning. When did it happen? Where did it happen? What made it happen? How did it happen?
The key to binding further claims about bounds within which the computation was carried out lies in having the embassy
determine upper bounds for the seal itself. Because the sealed computation is bound by the seal, upper bounds on the seal itself can be used to support claims about specific circumstances of the computation. The tighter these upper bounds on the seal, the more specific the circumstances:
-
Time. When it comes to temporal bounds, the
embassy
can rely on a tamper-evident clock in marking the points in time when theworkload
was sealed and unsealed, respectively. For all theembassy
knows, theworkload
might have decided to just sleep for one whole minute before actually getting on with processing the inputs. Theembassy
can only make claims about the sealed processing session as a whole. -
Space. When it comes to spatial bounds, the
embassy
may itself execute a delay-based geolocation routine. In brief, imagine a network of servers spread around the globe. Theembassy
can ping these scattered servers and measure the response times. Knowing that these signals won't surpass the speed of light, theembassy
can infer bounds on the physical location of the whole enclave through trilateration. -
Compute. When it comes to chaims about specific hardware that may have been used in the computation, the
embassy
may simply query for the local hardware architecture, perhaps the enclave has access to one Nvidia H100. Going beyond that, theembassy
may also fetch the unique identifiers burned into the chips during manufacturing. Nvidia exposes these through standard interfaces, for instance. Same as with bounds on time and space, theembassy
can only claim that theworkload
has used at most this hardware. For all theembassy
knows, theworkload
might only respond to the input with a hard-coded string, not even making use of the expensive accelerators it has at its disposal. -
Capability. Bounds on time periods, geographical regions, and available compute are already quite handy as givens of the enclave. However, a whole additional set of claims opens up when considering workload analysis. For instance, we have previously reverse engineered the memory layout of an inference workload in a way that is agnostic to the actual model architecture, decoding strategy, etc. Concurrent work is focused on radically improving the efficiency of these techniques, such that not one additional FLOP needs to be expended by the GPU in order to locate model activations in its memory. Optimizations aside, getting a fix on model activations can then enable interpretability techniques to measure how much of certain capabilities has been consumed to produce the
workload
outputs. -
Capability (Deferred). While it may be possible to attach claims about capability usage to generated completions, doing that naively might open the door to optimizing inputs against monitoring to evade them. We envision a subtly modified version of the sealed computation protocol, where the
user
includes in their request a message only accessible to theembassy
: "Hi, I'm the AI Safety Institute, here's my ID. Just so you know, I'm having theworkload
engage with some classified information. As you monitor theworkload
, please add the neural signatures you're observing in your registry, under capability #42." During following requests, theembassy
would only make claims about the fact that it has made sure to update its meters for the sealed computation. An authorizeduser
, perhaps one acting as a Defender or Rightsholder in the virtuality.network, may later request a reading of the meters. -
Space (User). Let's explore one final flavor of bindable claim. Perhaps not only does the
embassy
execute a delay-based geolocation routine in the enclave, but theuser
also needs to one carry out. It includes in its request: "Look, here are the coordinates obtained from a measured geolocation procedure." Being able to determine the user's jurisdiction this way, theembassy
can meter not only what capabilities are consumed, but by users in what jurisdictions. For Hosts which accede to the virtuality.network, and so consent to enforce regional guidance advanced by Defenders, this may be useful for proving that they haven't sold more than a set amount of 'engaging with CBRN notions' into a certain jurisdiction, for instance.
Trust Base
The claims which the embassy
can issue may prove powerful in supporting governance initiatives around model inference. However, these inference certificates and proofs of locality all depend on several assumptions.
-
Mitigated Embassy Attacks. First, there's the assumption that the
embassy
itself, and more broadly the guest OS it's embedded in, are not inherently malicious. Backing this assumption is the fact that the code of theembassy
, as well as the guest OS, would be open source and so independently auditable. Third-parties should be able to independently validate that they're doing strictly what they should be doing in terms of playing their role in the protocol. -
Mitigated External Attacks. Second, there's the assumption that the
embassy
itself, and more broadly the guest OS, have not been corrupted by external attackers. These would attempt to gain privileged access to the guest OS, either by breaching the trust boundary from without or by escaping theworkload
isolation from within. To address the former, a hardened guest OS would be used, and it would implement a measured bootchain which extends into the runtime. This would allow theuser
to notice that the system measurements are not the expected ones, and conclude that the integrity of theembassy
has been corrupted. To address the latter, the guest OS would implement container isolation best practices, again independently auditable due to being open source. -
Mitigated Host Attacks. Third, there's the assumption that the Host itself would not corrupt the
embassy
. Measured boot relies on the guest OS extending registers in the TPM of the machine: "This is firmware, extend this register with the hash of this bootloader, over to you. This is bootloader, extend this register with the hash of the kernel, over to you..." The Host can have the TPM report the golden measurements while secretly intervening on the guest OS. To address this, TEEs have shield workloads from the host machine. That said, they still assume that the hardware itself is not malicious or corrupted, though there are active efforts in ensuring integrity across the hardware supply chain. Organizations in finance, defense, and other regulated industries need to process sensitive data, and compute providers want to access these lucrative businesses opportunities:
That said, TEEs are not bullet-proof. Zero-days have been repeatedly found in TEE stacks, and cyber-capable state actors are likely to possess such exploits. Yet by riding on advances in security motivated by lucrative-yet-regulated markets, sealed computation can take advantage of state-of-the-art security to narrow the assumptions that support proofs of locality. For the time being, having to assume that a motivated state actor is not interfering with your request, and that the open source embassy
is scrutinized for vulnerabilities, seems like a favorable starting point.
Conclusion
To recap, we introduced inference certificates as a target application, and then worked our way backwards to a simple protocol which addresses the scenario. We briefly interrogated the properties of the seal, as well as the range of claims which proofs of locality may enable. Finally, we assessed the assumptions of these proofs, which are directly tied to the constantly decreasing size of the trusted computing base.
We are working towards deploying such an inference service in order to play the role of the first Host in the virtuality.network. If you would like to support the development and application of such techniques, drop us a message at contact@noemaresearch.com.