Privacy-Preserving Benchmarks for High-Stakes AI Evaluation

by Paul Bricman, CEO

  • Evaluations
  • Cryptography
  • Infrastructure

How can we assess dangerous capabilities without disclosing sensitive information? Traditional benchmarks are like exams for AI, complete with reference solutions. However, a benchmark on bioterrorism would amount to a public FAQ on a sensitive topic.

To enable evaluation while mitigating misuse, we introduce hashmarks, a simple alternative to benchmarks. In their basic form, hashmarks are benchmarks whose reference solutions have been cryptographically hashed prior to publication.

To assess performance on a hashmark, developers first get their AI to answer an exam question. Then, they hash the candidate answer and see whether the hash matches the reference one. If the model got it wrong, the correct answer remains secret.

However, things are not that simple. We investigate the resilience of hashmarks against half a dozen failure modes, ranging from rainbow table attacks to the Streisand effects associated with obfuscating sensitive information.

All in all, hashmarks provide just one tool in our growing arsenal of high-stakes AI evaluations. We look forward to engaging with community feedback before pushing forward with concrete instances of hashmarks.

More resources

Interaction Paradigms for Model Evaluation

The evaluation of AI models is not merely a technical exercise; it's a delicate balancing act that must consider security, intellectual property, efficiency, and transparency. As we'll see, different evaluation arrangements offer varying trade-offs between these factors, and understanding these nuances is crucial for effective AI governance.

Read more

Become a Challenger.

Challengers are individuals who can push frontier models to their absolute limits. They're passionate about the integrity of digital, biological, and social systems, and are stress-testing our simulators across cybersecurity, biosecurity, and beyond — for fun and profit.