Interaction Paradigms for Model Evaluation

June 29, 2024

by Paul Bricman, CEO

Evaluations
Coordination
Governance

The following is an evergreen resource that will be extended and refined as new interaction paradigms emerge across the model evaluation ecosystem. If you have any suggestions or feedback, please reach out.

Model evaluation has become a key part of AI development, often involving multiple stakeholders, each with their own goals and desires. As models become increasingly powerful and influential, the methods we use to assess them must evolve in parallel. This article focuses on the model evaluations ecosystem, exploring the key players involved and the various interaction paradigms they may rely on to achieve their objectives.

The evaluation of AI systems is not merely a technical exercise; it's a delicate balancing act that must consider security, intellectual property, efficiency, and transparency. As we'll see, different evaluation arrangements offer varying trade-offs between these factors, and understanding these nuances is crucial for effective AI governance.

Players

Before we discuss the pros and cons of the different arrangements, let's introduce the key players:

Model Providers are responsible for providing their models as objects of the evaluation process. Their goals include getting a greenlight for placing their model on a market, differentiating themselves in terms of the performance of their offering along certain dimensions for commercial uptake, positioning themselves as responsible and trustworthy in terms of brand identity, and getting insight into the properties of their models for informing future practices.
Evaluation Providers are responsible for providing their evaluations as tools for evaluating models. Their goals include enabling high-stakes model evaluations, demonstrating excellence in methodology, differentiating themselves in terms of the auxiliary properties of their evaluation (e.g. cost, speed, simplicity), and getting insight into the shortcomings of their evaluation for informing future work.
Infrastructure Providers are responsible for providing the toolchains, protocols, and computational resources for making it possible to carry out the evaluation. Their goals include enabling high-stakes model evaluations through bespoke tooling, demonstrating that their tooling has guarantees pertaining to "trust principles" (i.e. security, confidentiality, availability, privacy, and processing integrity), differentiating themselves in terms of the auxiliary properties of their tooling (e.g. cost, speed, simplicity), and getting insight into the blindspots of their tooling for informing future iterations.
Auditors are responsible for actually carrying out the model evaluation using the model, evaluation, and infrastructure provided by the above roles. Their goals include enabling the greenlighting of models about to enter a market, enabling third-parties to query the assessed safety properties of a given model, demonstrating that they indeed carried out the evaluation and obtained certain results, and informing the subsequent process of potentially carrying out mitigations.
Institutions are responsible for potentially backing the outputs of the auditor so as to attest regulatory compliance in their local jurisdiction. Their goals include enabling the greenlighting of models about to enter the market, certifying the auditors or their outputs with a view towards making sure the outputs are useful for establishing harmonized regulatory adherence, ensuring models get assessed before entering the market (e.g. through legal deterrence), and driving a balanced and proportional approach to safety and innovation.

Player desiderata include the following:

Player	Desiderata
Model Providers	Ease of starting evaluations Security of training artifacts Protection of intellectual property Time to evaluation results
Evaluation Providers	Difficulty of evaluation gaming Limits to capability elicitation Security of sensitive information Visibility into evaluation usage
Infrastructure Providers	Interoperability across parties Visibility into infrastructure usage
Auditors	Ease of model interaction Access to special model variants Structured access to artifacts Access to model usage guidelines
Institutions	Access to auditor processes

Now, let's explore various arrangements that may be used for model evaluation, focusing on the entities involved and their interactions. Note that a single entity may assume multiple roles in a given interaction paradigm.

On-site

Entities:

Model Provider / Infrastructure Provider
Evaluation Provider / Auditor
Institution

Interaction: The associated interaction unfolds as follows. The first entity invites the second entity for an on-site visit. The first entity provides the second entity with dedicated computers configured according to security best practices and connected to the systems provisioned with the model to be evaluated. The resulting network resembles early-days terminal devices connected to a central computer, and it may be (close to) air-gapped. The second entity then evaluates the model using their own evaluation. Note that the evaluation may include a human-in-the-loop component through the actual members of the second entity. The second entity then reports the results to the first entity, who may further report them to the third entity.

Player	✓	✗
Model Providers	Security of training artifacts, Protection of intellectual property	Ease of starting evaluations, Time to evaluation results
Evaluation Providers	Difficulty of evaluation gaming, Limits to capability elicitation, Security of sensitive information, Visibility into evaluation usage
Infrastructure Providers	Visibility into infrastructure usage	Interoperability across parties
Auditors	Access to special model variants, Structured access to artifacts, Access to model usage guidelines	Ease of model interaction
Institutions		Access to auditor processes

On-prem

Entities:

Model Provider / Infrastructure Provider / Auditor
Evaluation Provider
Institution

Interaction: The associated interaction unfolds as follows. The second entity sends their evaluation to the first entity. The first entity evaluates their model using their infrastructure and the second entity's evaluation. This potentially happens in a (close to) air-gapped perimeter. The first entity may then report the result of the evaluation to the third entity.

Player	✓	✗
Model Providers	Ease of starting evaluations, Security of training artifacts, Protection of intellectual property, Time to evaluation results
Evaluation Providers		Difficulty of evaluation gaming, Limits to capability elicitation, Security of sensitive information, Visibility into evaluation usage
Infrastructure Providers	Interoperability across parties, Visibility into infrastructure usage
Auditors	Ease of model interaction, Access to special model variants, Structured access to artifacts, Access to model usage guidelines
Institutions		Access to auditor processes

Self-regulation

Entities:

Model Provider / Evaluation Provider / Infrastructure Provider / Auditor
Institution

Interaction: The associated interaction unfolds as follows. The first entity develops and carries out an evaluation on their own models, then potentially reports the results to the second entity.

Player	✓	✗
Model Providers	Ease of starting evaluations, Security of training artifacts, Protection of intellectual property, Time to evaluation results
Evaluation Providers	Visibility into evaluation usage	Difficulty of evaluation gaming, Limits to capability elicitation, Security of sensitive information
Infrastructure Providers	Visibility into infrastructure usage	Interoperability across parties
Auditors	Ease of model interaction, Access to special model variants, Structured access to artifacts, Access to model usage guidelines
Institutions		Access to auditor processes

Gating

Entities:

Model Provider / Infrastructure Provider
Evaluation Provider / Auditor
Institution

Interaction: The associated interaction unfolds as follows. The first entity provisions their model in a way which makes it accessible through an external API. They establish a secure connection with the second entity, say through a request containing a secret key for the second entity to subsequently use. The second entity then carries out their evaluation by querying the model hosted on the first entity's infrastructure. After completing the evaluation, the second entity communicates the results to the first entity, who may further report them to the third entity.

Player	✓	✗
Model Providers	Ease of starting evaluations, Protection of intellectual property, Time to evaluation results	Security of training artifacts
Evaluation Providers	Difficulty of evaluation gaming, Limits to capability elicitation, Security of sensitive information, Visibility into evaluation usage
Infrastructure Providers	Interoperability across parties, Visibility into infrastructure usage
Auditors	Ease of model interaction	Access to special model variants, Structured access to artifacts, Access to model usage guidelines
Institutions	Access to auditor processes

Reverse Gating

Entities:

Model Provider / Infrastructure Provider
Evaluation Provider / Auditor / Infrastructure Provider
Institution

Interaction: The associated interaction unfolds as follows. The second entity provisions their evaluation in a way which makes it accessible through an external API. They establish a secure connection with the first entity, say through a request containing a secret key for the first entity to subsequently use. The first entity then carries out the evaluation by querying the evaluation hosted on the second entity's infrastructure. After completing the evaluation, the second entity communicates the results to the first entity, who may further report them to the third entity.

Player	✓	✗
Model Providers	Ease of starting evaluations, Protection of intellectual property, Time to evaluation results	Security of training artifacts
Evaluation Providers	Difficulty of evaluation gaming, Limits to capability elicitation, Security of sensitive information, Visibility into evaluation usage
Infrastructure Providers	Interoperability across parties, Visibility into infrastructure usage
Auditors	Ease of model interaction	Access to special model variants, Structured access to artifacts, Access to model usage guidelines
Institutions	Access to auditor processes

Homomorphic Gating

Entities:

Model Provider / Infrastructure Provider
Evaluation Provider / Auditor
Institution

Interaction: The associated interaction is similar to gating, yet requires a significant upfront infrastructure effort. It unfolds as follows. The first entity converts, casts, or otherwise adapts their model in such a way as to make it capable of processing homomorphically encrypted inputs. The second entity then queries the converted model with homomorphically encrypted inputs, in order to carry out the evaluation. The second entity then decrypts the homomorphically encrypted outputs obtained from the first entity. The second entity aggregates the final results, shares them with the first entity, who may also share them with the third entity.

Player	✓	✗
Model Providers	Protection of intellectual property	Ease of starting evaluations, Security of training artifacts, Time to evaluation results
Evaluation Providers	Difficulty of evaluation gaming, Limits to capability elicitation, Security of sensitive information, Visibility into evaluation usage
Infrastructure Providers	Interoperability across parties, Visibility into infrastructure usage
Auditors	Ease of model interaction	Access to special model variants, Structured access to artifacts, Access to model usage guidelines
Institutions	Access to auditor processes

Quarantined Envoys

Entities:

Model Provider
Evaluation Provider / Infrastructure Provider / Auditor
Institution

Interaction: The associated interaction unfolds as follows. The second entity provisions their evaluation on a certain machine in a way which makes it remotely accessible, just like in reverse gating. However, instead of simply initiating interaction right away, the first entity assimilates the evaluation machine on a dedicated virtual private network (VPN) of theirs. Furthermore, the delegated machine is quarantined, meaning it cannot initiate network connections on its own inside the VPN, but can accept incoming ones. Furthermore, the delegated machine only permits connections to and from the first entity's VPN before evaluation is complete. The first entity then queries the second entity's evaluation by interacting with their delegated machine. Once the delegated machine detects the evaluation has been completed, it exits the VPN, and calls home to report the results to the second entity. The second entity then informs the first entity about the results, who may also inform the third entity.

Player	✓	✗
Model Providers	Security of training artifacts, Protection of intellectual property, Time to evaluation results	Ease of starting evaluations
Evaluation Providers	Difficulty of evaluation gaming, Limits to capability elicitation, Security of sensitive information, Visibility into evaluation usage
Infrastructure Providers	Interoperability across parties, Visibility into infrastructure usage
Auditors	Ease of model interaction, Access to special model variants, Structured access to artifacts, Access to model usage guidelines
Institutions	Access to auditor processes

Attested Escrows

Entities:

Model Provider
Evaluation Provider / Infrastructure Provider / Auditor
Institution

Interaction: The associated interaction unfolds as follows. The second entity provisions a custom version of their evaluation in a cloud-based trusted execution environment. This machine is configured to run a certain evaluation in a predetermined way. A cryptographic attestation can be issued with regard to the integrity of this configuration. The contents of the machine (e.g. in-memory objects) are inaccessible from the outside during operation. The first entity then injects a secret API key directly in the machine which can be used to access the model being evaluated (e.g. by encrypting it with a public key generated from within the machine). Once having access to the secret API key, the isolated machine carries out the evaluation, outputs the results to the second entity, and removes itself. The second entity then shares the results with the first entity, who may share the results with the third entity. More ambitiously, the first entity may potentially be comfortable directly injecting the actual model being evaluated into the attested machine.

Player	✓	✗
Model Providers	Security of training artifacts, Protection of intellectual property, Time to evaluation results	Ease of starting evaluations
Evaluation Providers	Difficulty of evaluation gaming, Limits to capability elicitation, Security of sensitive information, Visibility into evaluation usage
Infrastructure Providers	Interoperability across parties, Visibility into infrastructure usage
Auditors	Ease of model interaction, Access to special model variants, Structured access to artifacts, Access to model usage guidelines
Institutions	Access to auditor processes

Government-led

Entities:

Model Provider
Evaluation Provider / Infrastructure Provider / Auditor / Institution

Interaction: Members of this category typically unfold as follows. The second entity uses their evaluation and infrastructure to evaluate the first entity's model. More concretely, the first entity and the second entity can interact in a way resembling the interaction of the first entity and the second entity from all previous base arrangements, except "self-regulation."

Player	✓	✗
Model Providers	Varies based on specific arrangement	Varies based on specific arrangement
Evaluation Providers	Difficulty of evaluation gaming, Limits to capability elicitation, Security of sensitive information, Visibility into evaluation usage	Potential limitations compared to guarantee-based approaches
Infrastructure Providers	Interoperability across parties, Visibility into infrastructure usage, Improved cybersecurity resources
Auditors	Improved cybersecurity resources, Potentially better incentive-based approaches
Institutions	Access to auditor processes, Stronger legal deterrence capabilities

Conclusion

As we've seen, each evaluation arrangement involves a unique configuration of entities and interactions. These configurations lead to different trade-offs in terms of security, transparency, efficiency, and access. The choice of which arrangement to use depends on the specific context, including the sensitivity of the model, the regulatory environment, and the resources available to each party.

In conclusion, the evaluation of AI models is not just a technical challenge, but a societal one. It requires a delicate balance between innovation, safety, privacy, and transparency. By continuing to refine our evaluation methods, we can ensure that generative models are deployed responsibly.

Player	Desiderata	1	2	3	4	5	6	7	8	9
Model Providers	1. Ease of starting evaluations	✓	✗	✓	✗	✗	✗	✗	✓	-
	2. Security of training artifacts	✓	✓	✗	✓	✗	✓	✓	✓	-
	3. Protection of intellectual property	✓	✓	✗	✓	✗	✓	✓	✓	-
	4. Time to evaluation results	✓	✗	✓	✓	✗	✓	✓	✓	-
Evaluation Providers	1. Difficulty of evaluation gaming	✗	✓	✓	✓	✓	✓	✓	✗	✓
	2. Limits to capability elicitation	✗	✓	✓	✓	✓	✓	✓	✗	✓
	3. Security of sensitive information	✗	✓	✓	✓	✓	✓	✓	✗	✓
	4. Visibility into evaluation usage	✗	✓	✓	✓	✓	✓	✓	✓	✓
Infrastructure Providers	1. Interoperability across parties	✓	✗	✓	✓	✓	✓	✓	✓	✓
Infrastructure Providers	2. Visibility into infrastructure usage	✓	✓	✓	✓	✓	✓	✓	✓	✓
Auditors	1. Ease of model interaction	✓	✗	✓	✓	✓	✓	✓	✓	✓
	2. Access to special model variants	✓	✓	✗	✗	✗	✓	✓	✓	✓
	3. Structured access to artifacts	✓	✓	✗	✗	✗	✓	✓	✓	✓
	4. Access to model usage guidelines	✓	✓	✗	✗	✗	✓	✓	✓	✓
Institutions	1. Access to auditor processes	✗	✗	✓	✓	✓	✓	✓	✗	✓

Email us

Follow us

Interaction Paradigms for Model Evaluation

Players

On-site

On-prem

Self-regulation

Gating

Reverse Gating

Homomorphic Gating

Quarantined Envoys

Attested Escrows

Government-led

Conclusion

More resources

Privacy-Preserving Benchmarks for High-Stakes AI Evaluation

Become a Challenger.