Anthropic Softens Its Signature Safety Promise While Battling the Pentagon Over “Red Lines”
The company says its old “pause if we can’t ensure safety” rule could leave the world worse off if competitors race ahead—critics see a retreat from the moral high ground at the exact moment Washington is pressuring it to loosen guardrails.

What Happened (Facts)
Anthropic has updated its Responsible Scaling Policy (RSP) to Version 3.0 (dated February 24, 2026) and, in doing so, removed a core element that helped define its safety-first identity: the idea that the company should pause training more powerful models if their capabilities outstrip Anthropic’s ability to control them safely.
Instead of binding “hard commitments,” Anthropic is moving toward a framework it describes as public goals it will openly grade its progress toward, including publishing a Frontier Safety Roadmap and quantified Risk Reports.
Anthropic’s stated rationale is that its previous RSP—about two years old—was designed partly to build industry consensus around safety “guardrails,” but the broader AI industry “blew through” those norms. Anthropic also acknowledged that its older posture was increasingly out of step with political realities and a fast-moving competitive environment.
A key part of the company’s argument is strategic: if responsible labs slow down while less careful actors push forward, the result may be a less safe world, not a safer one. This idea is echoed in reporting that quotes Anthropic’s leadership saying unilateral commitments to stop training don’t make sense if competitors continue “blazing ahead.”
This policy shift is landing in the middle of a separate, high-stakes confrontation with the U.S. Department of Defense. Multiple reports say Defense Secretary Pete Hegseth has given Anthropic a Friday deadline to loosen usage restrictions so the Pentagon can use Claude for “all lawful use,” with potential consequences including termination of a Pentagon deal and other punitive steps.
Reporting describes Anthropic’s “red lines” in those talks as refusing to enable:
AI-controlled weapons, and
mass domestic surveillance of American citizens,
citing reliability concerns and a lack of clear legal/regulatory frameworks for such uses.
At the same time, Anthropic says it remains committed to supporting national security work “in line with what our models can reliably and responsibly do,” and it has portrayed itself as a safety-leading AI company since its founding by former OpenAI employees.
In the broader media coverage of the RSP change, Time reports Anthropic “dropped the central pledge” that previously barred it from training systems unless it could guarantee adequate safety measures in advance, framing the move as a major shift driven by competitive and geopolitical realities.
The Wall Street Journal similarly characterizes the update as dialing back safety commitments to remain competitive, while still emphasizing transparency measures.
What Is Analysis (Interpretation)
1) Anthropic is trading “hard stops” for “transparency promises”
The key change isn’t that Anthropic stopped caring about safety—it’s that it’s changing the enforcement mechanism. A “we will pause” pledge is an internal brake. A “we will publish roadmaps and grade ourselves” pledge is an external accountability signal.
That shift matters because transparency does not automatically stop a race. It can inform the public and policymakers, but it doesn’t guarantee restraint when pressure peaks—especially if competitors aren’t bound by similar norms.
2) The company is arguing that unilateral restraint can backfire
Anthropic’s new logic is essentially: if you pause and your rivals don’t, you might hand the future to actors with weaker safeguards. This is a real collective-action problem. But it also becomes a convenient justification for moving fast: “We’re not racing for profit; we’re racing so the safer people win.”
The problem is that outsiders can’t easily verify that “faster” truly produces “safer outcomes,” especially when the same move (dropping a hard commitment) also improves competitive flexibility.
3) The Pentagon standoff makes the timing look politically loaded—even if unrelated
Even if Anthropic’s RSP update was planned independently, it will be interpreted through the lens of the Pentagon dispute. The optics are unavoidable:
Washington pressures a safety-focused lab to loosen rules
The lab loosens a core safety rule in the same week
That doesn’t prove causation, but it makes credibility harder to maintain—particularly for a company that branded itself as the “safety counterweight” in the AI race.
4) “All lawful use” vs “reliability boundaries” is the coming fight
The Pentagon’s reported position (the military decides what’s lawful) clashes with the stance safety labs increasingly take: capability and reliability impose their own limits, even if something is technically legal.
This is the deeper policy gap: laws are written for humans and institutions; AI systems are probabilistic, failure-prone, and sometimes steerable in unexpected ways. The question becomes: who has the final say over deployment constraints—vendor policy or state authority?
5) Expect more “race-to-the-top” rhetoric, but fewer irreversible commitments
Anthropic’s update suggests a broader industry direction: companies may continue to talk about safety leadership while avoiding hard-to-reverse promises that could put them at a disadvantage. The more intense the competition and procurement pressure, the more safety becomes framed as:
transparency,
audits,
“roadmaps,”
rather than “we will stop.”
That may be pragmatic, but it also increases the burden on regulators and buyers to enforce limits—because vendors are signaling they may not self-restrict when the stakes rise.



Comments
There are no comments for this story
Be the first to respond and start the conversation.