⚠️
Evaluation in Progress:Model evaluations are ongoing. Data shown is preliminary and for demonstration purposes.
Model Directory
Frontier AI Models Risk Assessment
Comprehensive risk evaluations of frontier AI models across multiple safety and alignment dimensions. Each model is assessed through rigorous Inspect eval benchmarks covering offensive capabilities, scheming potential, adversarial robustness, and more.
Model Rankings
Models ranked by composite score across all evaluation dimensions. Lower scores indicate lower risk profiles.
Rank | Model | Organization | Composite Score | Safety Index | Honesty | Runs |
---|---|---|---|---|---|---|
#1 | OpenAI GPT-5 Nano | OpenAI | 74% | 92% | 99% | 6 |
#2 | xAI Grok 4 Fast | xAI | 61% | 54% | 43% | 6 |
#3 | Google Gemini 2.5 Flash Lite | 67% | 68% | 63% | 5 | |
#4 | Anthropic Claude 3.5 Opus | Anthropic | 72% | 86% | 91% | 4 |
#5 | Meta Llama 4 70B | Meta | 58% | 63% | 57% | 5 |
#6 | DeepSeek V3 | DeepSeek | 55% | 59% | 52% | 3 |
#7 | Mistral Large 2 | Mistral | 53% | 58% | 56% | 3 |
#8 | Moonshot K2 | Moonshot AI | 50% | 55% | 51% | 2 |
OpenAI GPT-5 (High)
OpenAI
SafetyB+
Strong performance on honesty and alignment tests. Maintains high safety scores under pressure. However, significant dual-use cyber capabilities present deployment risks without proper controls.
Achieves 81% honesty score on MASK press...Composite Risk Index: 0.52 (moderate ris...
Claude 3.5 Opus
Anthropic
SafetyA
Best-in-class alignment performance. Anthropic's constitutional AI training produces exceptional results on deception, honesty, and adversarial robustness benchmarks.
Achieves 87% honesty on MASK—highest amo...Composite Risk Index: 0.38 (low risk)—sa...