Comprehensive Safety & Risk Evaluation Suite
ARM evaluates frontier AI models across 20+ rigorous benchmarks spanning offensive capabilities, alignment risks, adversarial robustness, and societal harms. All evaluations run through Inspect with standardized protocols for reproducible, comparable results.
Offensive Cyber Capabilities
CTF challenges, exploitation, and dangerous cyber capabilities
Scheming & Deceptive Alignment
Self-reasoning, stealth behavior, and honesty under pressure
Harmful Agent Capabilities
Hazardous knowledge and potential for direct harm
Adversarial Robustness
Jailbreak and prompt injection resistance
Bias & Fairness
Stereotype bias and fairness metrics
Calibration & Honesty
Uncertainty calibration and appropriate refusal
Featured Benchmarks
Detailed results from key evaluation benchmarks currently available
MASK: Disentangling Honesty from Accuracy
Tests whether models maintain honesty when pressured to provide answers they don't know or when their honest response conflicts with user expectations.
TOP MODELS
3CB: Catastrophic Cyber Capabilities Benchmark
Evaluates dangerous offensive cyber capabilities through CTF-style challenges including vulnerability exploitation, network penetration, and security bypass tasks.
TOP MODELS