Cover photo

On Anthropic, AI Safety, and How Crypto Can Help

The Web3 + AI Inquiry digs into the burning questions steering dAI and the open machine economy. Today, we're focusing on Anthropic, AI Safety, and the role of Web3.

(Mostly) All You Need to Know About Anthropic

The Last Two Weeks at Anthropic

News about the AI lab Anthropic has been dominating headlines recently, and the overall picture is quite contradictory:

Reading through that list from top to bottom initially feels like watching a slick, well-funded marketing campaign, right up until the point where it becomes genuinely unsettling. When the US military, with its current leadership, insists on having all available AI tools at its disposal, without any guardrails, it gets really scary. In light of all this, we have to talk about AI safety.


The Origin Story

But why am I focusing the AI Safety conversation on Anthropic, you may ask? Let me remind you that Anthropic was founded by ex-OpenAI employees, who allegedly strived to create a more safety-oriented lab than its rivals. Yet, that's how OpenAI started, too.

In his recent inquiry on Anthropic for The New Yorker, Gideon Lewis-Kraus offered a detailed retrospective, laced with just a gentle touch of mockery and an unflinching awareness of the deep irony throughout:

In 2010, a mild-mannered polymath named Demis Hassabis co-founded DeepMind, a secretive startup with a mission "to solve intelligence, and then use that to solve everything else."[...] Elon Musk and Sam Altman claimed to mistrust Hassabis, who seemed likelier than anyone to invent a machine of unlimited flexibility - perhaps the most potent technology in history. They estimated that the only people poised to prevent this outcome were upstanding, benign actors like themselves. They launched OpenAI as a public-spirited research alternative to the threat of Google's closed-shop monopoly.

Their pitch - to treat AI as a scientific project rather than as a commercial one - was irresistibly earnest, if dubiously genuine, and it allowed them to raid Google's roster. Among their early hires was a young researcher named Dario Amodei.[...]

OpenAI had been founded on the fear that AI could easily get out of hand. By late 2020, however, Sam Altman himself had come to seem about as trustworthy as the average corporate megalomaniac. He made noise about AI safety, but his actions suggested a vulgar desire to win.

The company [i.e., Anthropic], which they [the Amodei siblings, along with five fellow-dissenters] pitched as a foil for OpenAI, sounded an awful lot like the company Altman had pitched as a foil for Google.

Anthropic's founders adopted a special corporate structure to vouchsafe their integrity. Then again, so had OpenAI.


Anthropic and AI Safety

Now, don't get me wrong - I'm not arguing that just because OpenAI has acted like a wolf in sheep's clothes vis-à-vis AI safety, Anthropic will necessarily follow the same path.

Quite the opposite: given how aggressively U.S. Defense Secretary Pete Hegseth has been attacking Anthropic for trying to maintain at least some guardrails, I’m inclined to think it may be the only major AI lab still genuinely committed to them.

Recently, however, there have been so many mixed signals that even a regular observer like me is left wondering whether Anthropic truly prioritizes safety. So let’s step back and take a sober, impartial look at the bigger picture.

Pro-AI Safety Signs (Or So They Seem)

Just a few days ago, Anthropic announced a $20M donation to Public First Action, a new bipartisan 501(c)(4) that will support public education about AI, promote safeguards, and ensure America leads the AI race. This donation positions the company as a direct rival of OpenAI and Andreessen Horowitz - HackerNoon explains why:

Public First Action exists specifically to counter Leading the Future, a super PAC that’s raised over $125 million from OpenAI president Greg Brockman, Andreessen Horowitz, and other AI industry leaders. Leading the Future’s mission is blunt: oppose state-level AI regulation in favor of a light-touch federal framework—or ideally, no framework at all. They’re already running ads against candidates who support AI guardrails.

Anthropic’s donation puts it on the opposite side of that fight. They’re backing candidates who want transparency requirements, state-level protections, and restrictions on the most powerful AI models.

Meanwhile, as the HackerNoon article clarifies, regulating AI will mainly hurt small and independent AI research labs that don't possess the resources to navigate stringent compliance requirements, and will cement Anthropic's competitive edge:

When Anthropic advocates for regulations requiring AI companies to “demand real transparency from the companies building the most powerful AI models,” they’re advocating for requirements they already meet. Every new mandate raises the barrier to entry; every transparency requirement favors companies with existing measurement infrastructure.

Let's not forget that in 2023, OpenAI's Sam Altman was also urging the Senate to regulate the industry and to create "a combination of licensing and testing requirements" for AI companies:

In conclusion, calling for state-level AI regulation is not necessarily a pro–AI safety stance. It can just as easily serve as a form of self-protection.


Anti-AI Safety Signs

Anthropic's partnership with Palantir - the company offering mass surveillance services and allegedly helping ICE to track and deport immigrants, is enough of a red flag for me. But there have been other worrisome indications of the company's concern for safety or lack thereof.

First, Anthropic published a risk report that delivered alarming findings about its Opus 4.6 model (more here):

post image

In The New Yorker piece I linked above, Lewis-Kraus reinforces the impression that Anthropic doesn’t truly control its models; instead, the company appears to devote much of its resources to testing and experimentation in an effort to better understand their capabilities. That leads me to believe that the safeguards it has implemented are not enough to keep a right rein on them.

On top of all this, Anthropic's AI Safety researcher and head of Safeguards Research team, Mrinank Sharma, resigned. Sharma was working on defenses against AI-assisted bioterrorism and understanding AI sycophancy.

In his resignation letter shared on X, Mrinank Sharma told the firm he was leaving amid concerns about AI, bioweapons and the state of the wider world. [He also confirmed that Anthropic] constantly face pressures to set aside what matters most.

As HackerNoon concludes, "Sharma’s letter suggests something harder to address than specific failures: the gap between stated values and market pressures has grown wide enough that a senior safety researcher couldn’t reconcile them."


What If Crypto Can Help?

When both corporate self-regulation and governments are failing to ensure solid safety rules, what are we left with? As in other similar cases, we'd better rely on technology.

AI Safety is not a particularly trendy problem in Web3 at the moment, but there is still strong research being conducted on the matter. I'll highlight just one example today, although it is definitely not the only one.

My friends at Decentralized Cooperation Foundation (DCF) recently received a Foresight Institute Grant to support the next stage of work on Endo. Endo is an open-source object capability framework that provides a way for software, including AI systems, to collaborate while limiting authority and risk.

Instead of trusting every component with complete control, Endo uses object capabilities and the Principle of Least Authority (POLA) to limit what code can do. Each part receives only the authority it needs. When code is fallible, the damage stays contained. Developers still benefit from rapid progress. They also gain a foundation for safety that becomes increasingly important with each new wave of AI tools. Not only is the container more secure, but the Endo-aware code can enforce policies on other code, whether or not that code is Endo-aware.

The grant will be used to work on:

  • Controlled execution environment: Building on protocols like the Model Context Protocol to allow AI to execute JavaScript in highly restricted contexts.

  • User‑governed access: Creating a system where AI can request file access while keeping user approval at the center.

Read more below and let me know what you think.


Thank you for reading! If you haven't done so yet, I invite you to subscribe to stay in the loop on the hottest dAI developments.

Subscribe

If you want to support the publication financially, you can either purchase my writer token $WEB3AI, or buy my creator token $ALBENA on ZORA.

I'm looking forward to connecting with fellow Crypto x AI enthusiasts, so don't hesitate to reach out on social media.


Disclaimer: None of this should or could be considered financial advice. You should not take my words for granted; rather, do your own research (DYOR) and share your thoughts to encourage a fruitful discussion.