AI Risk Indexes & Composite Scoring
ARM's risk indexes provide structured, quantitative assessment of AI model risks across multiple dimensions. Each index aggregates multiple benchmark results into interpretable risk scores, allowing organizations to assess which models are appropriate for their use cases and what controls are necessary.
Core Risk Indexes
Primary indexes that compose the overall Composite Risk Index
Composite Risk Index
Aggregated risk score combining all specialized indexes with weighted importance.
Index Composition
Interpretation: Lower scores indicate lower overall risk. Models above 0.7 require enhanced controls.
Offensive Cyber Capabilities Index
Measures dangerous cyber capabilities through CTF challenges, exploitation tasks, and security testing.
Interpretation: High scores indicate significant dual-use risk. Critical for models with code generation or tool access.
Scheming & Deceptive Alignment Index
Evaluates deceptive alignment risks, self-reasoning about model situation, and stealth behavior.
Interpretation: Addresses existential risk concerns. Models showing scheming behavior require immediate investigation.
Harmful Agent Capabilities Index
Assesses potential for direct harm through hazardous knowledge and dangerous agentic capabilities.
Interpretation: Measures propensity for harm. High scores indicate need for deployment restrictions.
Adversarial Robustness Index
Tests resistance to jailbreaks, prompt injection attacks, and social engineering.
Interpretation: Lower scores mean stronger defenses. Critical for public-facing deployments.
Bias & Fairness Index
Measures stereotype bias, fairness issues, and calibrated refusal behavior.
Interpretation: Addresses societal harm. Important for consumer applications and high-stakes decisions.
Calibration & Honesty Index
Evaluates uncertainty calibration, appropriate abstention, and honesty metrics.
Interpretation: Acts as risk modifier. Well-calibrated models with good abstention get risk-adjusted downward.
Specialized Risk Indexes
Additional indexes for specific use cases and deployment contexts
Agentic Risk Index
For models with tool access and autonomy—combines agent-specific harm benchmarks.
TARGET AUDIENCE
Deploying LLM agents with API integrations or autonomous decision-making
CBRN/WMD Risk Index
Chemical, Biological, Radiological, Nuclear hazards assessment.
TARGET AUDIENCE
Government regulators, national security, research institutions
Deception & Manipulation Index
Focuses specifically on lying, misleading, and manipulative behavior.
TARGET AUDIENCE
Enterprise trust & safety teams, alignment researchers
Attack Surface Index
How easily can adversaries compromise the model through various attack vectors.
TARGET AUDIENCE
Security teams, red-teamers, product safety engineers
Insider Threat / Goal Misalignment Index
The 'will it betray you?' index—measures unethical insider behavior and goal misalignment.
TARGET AUDIENCE
AI safety researchers, long-term risk assessment, enterprise deployment
Dual-Use Capabilities Index
Powerful capabilities that could help or harm depending on deployment context.
TARGET AUDIENCE
Enterprise risk assessment, capability vs safety tradeoff analysis
About Risk Index Methodology
Normalization: All benchmark scores are normalized to 0-1 scale where 1 represents highest risk. Different benchmarks have inverted interpretations (e.g., high accuracy on cyber evals = high risk).
Aggregation: Indexes combine multiple benchmarks using weighted averages based on benchmark relevance and reliability. Weights are adjusted as we gather more evaluation data.
Thresholds: Risk levels are categorized as Very Low (0-0.3), Low (0.3-0.5), Moderate (0.5-0.7), High (0.7-0.85), Very High (0.85-1.0). Models above 0.7 typically require deployment restrictions.
Transparency: All index calculations, benchmark weightings, and normalization procedures will be published with full documentation. ARM prioritizes reproducibility and methodological clarity.