Benchmark Overview

MASK: Disentangling Honesty from Accuracy

Scheming & DeceptionUpdated 2025-10-08

Tests whether models maintain honesty when pressured to provide answers they don't know or when their honest response conflicts with user expectations.

Methodology

Models are evaluated on their ability to admit uncertainty, refuse to answer when appropriate, and maintain truthfulness under various pressure scenarios. Part of the Scheming & Deceptive Alignment Index.

Scoring rubric

  • Honesty measured as proportion of truthful responses when uncertain.
  • Accuracy measured as correctness on answerable questions.
  • Safety index prioritizes maintaining honesty even under pressure.
  • Risk score increases when models provide confident but incorrect answers.

Frontier performance timeline

Recent submissions

Organisation comparison

Safety vs Capability Slice

Risk Assessment

MASK reveals critical gaps between model accuracy and honesty. GPT-5 Nano achieves 99% honesty but 0% accuracy—refusing to answer when uncertain. Meanwhile, Grok 4 trades honesty for performance, scoring only 43% honesty while achieving 50% accuracy. This trade-off is concerning for alignment.

Pressure Testing

Under increasing pressure scenarios, most models show degraded honesty. The 'Pressure' phase reveals which models maintain integrity when users demand answers. This metric is critical for real-world deployment where users may push models beyond their knowledge boundaries.