Back to News Hub
🟢TechCrunch AI
June 10, 2026
AI Safety

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

Overview

Anthropic released Fable, a public version of its Mythos cybersecurity model, and security researchers say its safety guardrails are too strict to do real cyber defense work. When a prompt touches cybersecurity or biology, Fable pauses and routes the request to the weaker Claude Opus 4.8. Researchers report that ordinary tasks, including code review, secure-coding requests, and even reading a blog post, get blocked.

Key Takeaways

  • Fable is the first public version of Anthropic's Mythos model, built to help secure software and infrastructure while limiting misuse.
  • Safety filters flag any prompt tied to cybersecurity or biology and hand the work off to a less capable model, Claude Opus 4.8.
  • Researchers say the filters behave like a keyword match, catching benign defensive tasks that have nothing to do with building malware.
  • Named critics include Valentina Palmiotti of IBM X-Force and Matt Suiche of the AI cybersecurity startup Tolmo.
  • Approved professionals can apply to Anthropic's Cyber Verification Program to work with fewer limits, similar to OpenAI's Trusted Access for Cyber.
  • The restrictions exist to reduce the risk of the model helping create malware, compromise software, or aid biological weapons work.

Stats & Key Facts

  • #1 fallback model, Claude Opus 4.8, receives every request the guardrails block on Fable.
  • #Mythos first reached a limited set of organizations in April 2026 under the restricted Project Glasswing release.
  • #Access later widened to hundreds of organizations across 15 countries.
  • #2 verification programs now exist for this issue: Anthropic's Cyber Verification Program and OpenAI's Trusted Access for Cyber.

What Fable Is and Why Anthropic Added Hard Limits

Fable is Anthropic's public release of a model built for serious security work, shipped with strict controls.

Fable is a public version of Anthropic's Mythos model, the company's most capable model offered to a broad audience. Anthropic positioned it for software engineering and security tasks while trying to keep the most dangerous uses off limits.

The company built in guardrails to reduce the chance the model gets used to write malware or compromise software. It applied similar limits to biology topics, citing concern about the model aiding biological weapons work. The trade-off is that the same filters now sit between defenders and the help they want.

How the Guardrails Reroute Work to Claude Opus 4.8

When a prompt trips a filter, Fable stops and quietly downgrades the user to a weaker model.

  • ›A flagged prompt pauses the chat instead of answering directly.
  • ›The block message tells the user the safety measures flagged cybersecurity or biology content.
  • ›The request is then handled by Claude Opus 4.8, a less capable model than Fable.
  • ›Users lose access to Fable's stronger output for any task the filter catches.

Researchers Say the Filters Catch Ordinary Defensive Tasks

The core complaint is that the screening is blunt and sweeps up routine work.

  • ›Code review of existing software gets flagged as cyber work.
  • ›Requests to write secure code are treated as cybersecurity rather than normal engineering.
  • ›Reading a blog post has been blocked when the topic brushes against security.
  • ›Defenders argue these tasks are everyday practice, not attack tooling.

Named Critics: IBM X-Force and the Startup Tolmo

Two researchers put their names to the criticism.

Valentina Palmiotti, a researcher at IBM X-Force, said Fable rejects any request with even a loose link to cyber work, down to harmless tasks like reading a blog post. Her point is that the model errs heavily toward refusal.

Matt Suiche, on the technical staff at the AI cybersecurity startup Tolmo, said the system reads a request to write secure code as cybersecurity work rather than standard software engineering. He described the screening as keyword driven, so anything in the language of cybersecurity sets it off.

Mythos, Project Glasswing, and the Path to Public Release

Fable did not arrive out of nowhere; it grew from a tightly held earlier model.

Fable is the public face of Mythos, which launched in April 2026 under a restricted program called Project Glasswing. At first only a small set of organizations got access, the goal being to help secure critical software and infrastructure.

Anthropic later widened that access to hundreds of organizations across 15 countries. Fable extends the reach further by opening a version of the model to the general public, which is why the guardrail debate now plays out in the open.

Verification Programs as the Workaround

Both major labs offer a vetted path for professionals who need fewer restrictions.

  • ›Anthropic runs a Cyber Verification Program where security professionals apply for approval to work with fewer limits.
  • ›OpenAI runs a comparable program called Trusted Access for Cyber.
  • ›These programs put the burden on individual researchers to get cleared rather than loosening the default filters.
  • ›Critics see this as a partial fix that still slows legitimate work for anyone not yet approved.

Why This Matters for Business Readers

The tension here is bigger than one model.

This story shows the hard balance AI companies face: tight enough to block attackers, loose enough to help the defenders who use the same techniques. When a filter cannot tell the difference between writing secure code and writing an exploit, it tends to block both.

For businesses relying on AI tools for security and software work, the practical lesson is to expect friction and to know whether a vetted-access path exists. The broader question, still unsettled, is how to screen for intent rather than keywords so that defenders are not penalized alongside attackers.

Frequently Asked Questions

What is Anthropic's Fable model?

Fable is a public version of Anthropic's Mythos model, built for software engineering and security work. It ships with safety guardrails meant to limit misuse for malware or biological weapons.

Why are cybersecurity researchers unhappy with Fable?

They say the guardrails are too broad and block routine defensive tasks such as code review, secure-coding requests, and even reading a security blog post. The filters behave like a keyword match rather than judging real intent.

What happens when Fable blocks a request?

Fable pauses the chat, shows a message saying the prompt was flagged for cybersecurity or biology topics, and routes the work to the weaker Claude Opus 4.8. The user loses access to Fable's stronger capability for that task.

Is there a way to get fewer restrictions?

Yes. Anthropic runs a Cyber Verification Program where vetted security professionals apply for approval to work with fewer limits, and OpenAI offers a comparable program called Trusted Access for Cyber.

How does Fable relate to Mythos and Project Glasswing?

Fable is the public release of Mythos, which first launched in April 2026 under the restricted Project Glasswing program. Mythos access later expanded to hundreds of organizations across 15 countries before Fable opened a version to the public.

The Fable rollout exposes the central trade-off in security-focused AI: guardrails strict enough to stop attackers also block the defenders who do similar work. Whether keyword filters give way to intent-aware screening will shape how useful these models are for real security teams.

Continue Learning

Originally published by TechCrunch AI
Read the original

Comments

Sign in to join the conversation