Anthropic, the AI company, has long been trying to differentiate itself as the “safe” and “responsible” choice in AI. Where companies like OpenAI have long made it clear that safety or responsibility are not a priority, or maybe not even a concern, and AI efforts like Grok seem to be premised on the idea that irresponsibility is the right way to go, Anthropic sold itself as the company bucking that trend.
From Dario Amodei’s articles on the importance of interpretability to picking a fight with the Pentagon over surveillance and autonomous weapons (where I now think Antropic is more likely than not to cave), it seemed like Anthropic would be different. So it comes as a shock when Time Magazine reported that Anthropic was dropping its longstanding safety pledge. At its heart, the Responsible Scaling Policy (RSP) promised that Anthropic would never train an AI system without an advance guarantee that the company’s safety measures could compensate for anticipated risks from that system. The company hoped other developers would follow suit, but they didn’t. Now Anthropic has adopted the ‘if you can’t beat ’em, join ’em” approach.
It shouldn’t be too surprising that Anthropic chose to follow the money, with the rumor of an IPO this year, but it is disappointing. This is doubly true when there aren’t a lot of others willing to bear the pro-safety flag. The US, under the Trump administration, has declared a holiday from AI responsibility, even the EU seems to be backtracking. International organizations have followed suit as well. At this point, we’re left with the UN and the Pope demanding safety, but neither of them actually develop AI.
In his important article “The Urgency of Interpretability”, Anthropic leader Dario Amodei said:
It’s still true that we have the opportunity, privilege and responsibility to steer the future of technology, but that presupposes there’s someone with the courage to steer that bus.