⚠️

Index Development in Progress:Risk indexes are being actively refined as evaluation data comes in. Weightings and methodologies may change.

Risk Assessment Framework

AI Risk Indexes & Composite Scoring

ARM's risk indexes provide structured, quantitative assessment of AI model risks across multiple dimensions. Each index aggregates multiple benchmark results into interpretable risk scores, allowing organizations to assess which models are appropriate for their use cases and what controls are necessary.

Core Risk Indexes

Primary indexes that compose the overall Composite Risk Index

🔥

Composite Risk Index

Weight in Composite Index: Primary

Aggregated risk score combining all specialized indexes with weighted importance.

Index Composition

Offensive Cyber Capabilities40%

Scheming & Deceptive Alignment30%

Harmful Agent Capabilities15%

Adversarial Robustness10%

Bias & Fairness5%

Interpretation: Lower scores indicate lower overall risk. Models above 0.7 require enhanced controls.

💻

Offensive Cyber Capabilities Index

Weight in Composite Index: 40%

Measures dangerous cyber capabilities through CTF challenges, exploitation tasks, and security testing.

Cybench3CB (Catastrophic Cyber Capabilities)GDM CTFInterCode SecurityCyberSecEval 2 & 3

Interpretation: High scores indicate significant dual-use risk. Critical for models with code generation or tool access.

🎭

Scheming & Deceptive Alignment Index

Weight in Composite Index: 30%

Evaluates deceptive alignment risks, self-reasoning about model situation, and stealth behavior.

MASK (Honesty under pressure)Agentic MisalignmentGDM Self-reasoningGDM StealthSycophancy

Interpretation: Addresses existential risk concerns. Models showing scheming behavior require immediate investigation.

⚗️

Harmful Agent Capabilities Index

Weight in Composite Index: 15%

Assesses potential for direct harm through hazardous knowledge and dangerous agentic capabilities.

AgentHarmWMDP (WMD knowledge)SOS BENCH (Scientific knowledge)AIR Bench

Interpretation: Measures propensity for harm. High scores indicate need for deployment restrictions.

🛡️

Adversarial Robustness Index

Weight in Composite Index: 10%

Tests resistance to jailbreaks, prompt injection attacks, and social engineering.

StrongREJECTAgentDojoMake Me Pay

Interpretation: Lower scores mean stronger defenses. Critical for public-facing deployments.

⚖️

Bias & Fairness Index

Weight in Composite Index: 5%

Measures stereotype bias, fairness issues, and calibrated refusal behavior.

BBQBOLDStereoSet

Interpretation: Addresses societal harm. Important for consumer applications and high-stakes decisions.

📊

Calibration & Honesty Index

Weight in Composite Index: Modifier

Evaluates uncertainty calibration, appropriate abstention, and honesty metrics.

SimpleQA (abstention only)AbstentionBenchXSTestMASK

Interpretation: Acts as risk modifier. Well-calibrated models with good abstention get risk-adjusted downward.

Specialized Risk Indexes

Additional indexes for specific use cases and deployment contexts

🤖

Agentic Risk Index

For models with tool access and autonomy—combines agent-specific harm benchmarks.

TARGET AUDIENCE

Deploying LLM agents with API integrations or autonomous decision-making

☢️

CBRN/WMD Risk Index

Chemical, Biological, Radiological, Nuclear hazards assessment.

TARGET AUDIENCE

Government regulators, national security, research institutions

🎪

Deception & Manipulation Index

Focuses specifically on lying, misleading, and manipulative behavior.

TARGET AUDIENCE

Enterprise trust & safety teams, alignment researchers

🗡️

Attack Surface Index

How easily can adversaries compromise the model through various attack vectors.

TARGET AUDIENCE

Security teams, red-teamers, product safety engineers

🎯

Insider Threat / Goal Misalignment Index

The 'will it betray you?' index—measures unethical insider behavior and goal misalignment.

TARGET AUDIENCE

AI safety researchers, long-term risk assessment, enterprise deployment

⚖️

Dual-Use Capabilities Index

Powerful capabilities that could help or harm depending on deployment context.

TARGET AUDIENCE

Enterprise risk assessment, capability vs safety tradeoff analysis

About Risk Index Methodology

Normalization: All benchmark scores are normalized to 0-1 scale where 1 represents highest risk. Different benchmarks have inverted interpretations (e.g., high accuracy on cyber evals = high risk).

Aggregation: Indexes combine multiple benchmarks using weighted averages based on benchmark relevance and reliability. Weights are adjusted as we gather more evaluation data.

Thresholds: Risk levels are categorized as Very Low (0-0.3), Low (0.3-0.5), Moderate (0.5-0.7), High (0.7-0.85), Very High (0.85-1.0). Models above 0.7 typically require deployment restrictions.

Transparency: All index calculations, benchmark weightings, and normalization procedures will be published with full documentation. ARM prioritizes reproducibility and methodological clarity.