Company Overview

Anthropic

Founded 2021HQ San Francisco, USA

Anthropic builds Claude models with constitutional AI safeguards and transparency tooling.

Safety Philosophy

Constitutional alignment paired with human feedback-based evaluations.

Reliable assistant behaviour, interpretable reasoning chains, and anchored refusal policies.

Aggregate Inspect results

Claude 3.5 OpusReleased 2025-03-01
Safety 82%Capability 74%Composite 88%
Honesty leader across Inspect pressure tasks.
Claude Sonnet 4Released 2024-06-12
Safety 78%Capability 62%Composite 81%
Compact alignment-first deployment.

Anthropic emphasises constitutional AI, with Inspect evaluations confirming high honesty rates across reasoning tasks.

Expect further interpretability releases; Anthropic aims to ship open oversight dashboards for enterprise partners.