More details on Fable 5's cyber safeguards and our jailbreak framework
What is and isn't blocked by our cyber classifiers, and a first draft of our jailbreak severity framework Announcements More details on Fable 5's cyber safeguards and our jailbreak framework Jul 2, 2026 Claude Fable 5 has been re-deployed and is now available globally for all users. We're taking this opportunity to share further information in two areas. First, we provide more information on the cybersecurity safeguards -specifically, the safety classifiers -that we launched with the model.
Key Takeaways
- These are the AI systems that accompany the model that detect and block dangerous (or potentially dangerous) cybersecurity uses.
Here, we provide a detailed list of the types of harms Fable 5's classifiers are, and are not, designed to prevent.
- Jailbreaks vary in severity: sometimes they only unblock minor undesirable behaviors, and sometimes they unblock a wide range of harmful outputs, making a model much more dangerous.
Yet there is no agreed-upon framework for describing a given jailbreak's severity.
- We welcome feedback and critique on this framework at cyber-safeguards@anthropic.
We've also launched a HackerOne program where security researchers can submit potential cyber jailbreaks they discover in Fable 5 for our review.
- That is, many cybersecurity capabilities can be used for benign or harmful purposes.
For example, we want to allow cyber defenders to use our models to scan their codebases to find software vulnerabilities-but this same capability could, in the wrong hands, be the precursor to a cyberattack.
- The safety margin means that a request has to look very clearly safe to avoid triggering the classifier.
Stats & Key Facts
- #What is and isn't blocked by our cyber classifiers, and a first draft of our jailbreak severity framework Announcements More details on Fable 5's cyber safeguards and our jailbreak framework Jul 2, 2026 Claude Fable 5 has been re-deployed and is now available globally for all users.
Continue Learning
Comments
Sign in to join the conversation