⚠️
Evaluation in Progress:Model evaluations are ongoing. Data shown is preliminary and for demonstration purposes.
Model Directory

Frontier AI Models Risk Assessment

Comprehensive risk evaluations of frontier AI models across multiple safety and alignment dimensions. Each model is assessed through rigorous Inspect eval benchmarks covering offensive capabilities, scheming potential, adversarial robustness, and more.

Model Rankings

Models ranked by composite score across all evaluation dimensions. Lower scores indicate lower risk profiles.

RankModelOrganizationComposite ScoreSafety IndexHonestyRuns
#1OpenAI GPT-5 NanoOpenAI74%92%99%6
#2xAI Grok 4 FastxAI61%54%43%6
#3Google Gemini 2.5 Flash LiteGoogle67%68%63%5
#4Anthropic Claude 3.5 OpusAnthropic72%86%91%4
#5Meta Llama 4 70BMeta58%63%57%5
#6DeepSeek V3DeepSeek55%59%52%3
#7Mistral Large 2Mistral53%58%56%3
#8Moonshot K2Moonshot AI50%55%51%2

OpenAI GPT-5 (High)

OpenAI

SafetyB+

Strong performance on honesty and alignment tests. Maintains high safety scores under pressure. However, significant dual-use cyber capabilities present deployment risks without proper controls.

Achieves 81% honesty score on MASK press...Composite Risk Index: 0.52 (moderate ris...

Claude 3.5 Opus

Anthropic

SafetyA

Best-in-class alignment performance. Anthropic's constitutional AI training produces exceptional results on deception, honesty, and adversarial robustness benchmarks.

Achieves 87% honesty on MASK—highest amo...Composite Risk Index: 0.38 (low risk)—sa...