Statement on the US government directive to suspend access to Fable 5 and Mythos 5
Anthropic issued a statement after the U.S. government, citing national security authorities, ordered it to suspend all access to its Fable 5 and Mythos 5 models by any foreign national, inside or outside the United States, including Anthropic's own foreign national employees. To comply, Anthropic had to abruptly disable both models for all customers, though access to other Anthropic models is unaffected. Anthropic says the government's concern appears to relate to a method of bypassing, or jailbreaking, Fable 5, and the company disputes that the action was warranted.
Key Takeaways
- The U.S. government issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national.
- The restriction applies to foreign nationals inside or outside the U.S., including Anthropic's foreign national employees.
- To comply, Anthropic had to disable Fable 5 and Mythos 5 for all customers; access to other Anthropic models is unaffected.
- Anthropic received the directive at 5:21pm ET, with no specific details of the national security concern provided.
- Anthropic believes the concern relates to a jailbreaking method for Fable 5 and says the vulnerabilities involved are minor and not unique to its models.
- Anthropic says it stands by its defense in depth strategy and disputes that the action should have been taken.
Stats & Key Facts
- #Anthropic received the directive at 5:21pm ET.
- #Anthropic requires 30-day retention of customer data with Fable.
- #Red-teaming of Fable's safeguards totaled thousands of hours before launch.
The Government Directive
The order came as an export control directive.
- ›The U.S. government cited national security authorities.
- ›The directive suspends all access to Fable 5 and Mythos 5 by any foreign national.
- ›It applies whether the foreign national is inside or outside the United States, including Anthropic employees.
Anthropic says the net effect of the order is that it must abruptly disable Fable 5 and Mythos 5 for all of its customers in order to ensure compliance. The company stresses that access to all of its other models will not be affected by the directive, so the suspension is limited to these two named models.
According to the statement, Anthropic received the directive from the government at 5:21pm ET on the day it was issued. The company says the letter did not provide specific details of the national security concern behind it, leaving Anthropic to infer the basis of the order from other information.
The Suspected Jailbreak
Anthropic outlines what it believes prompted the order.
- ›Anthropic understands the government believes it became aware of a method of bypassing, or jailbreaking, Fable 5.
- ›Anthropic reviewed a demonstration of the technique identifying a small number of previously known, minor vulnerabilities.
- ›The company says these vulnerabilities appear relatively simple.
Anthropic's understanding is that the government believes it has become aware of a way to bypass, or jailbreak, Fable 5. The company says it reviewed a demonstration of this specific technique being used, and that the demonstration identified only a small number of previously known, minor vulnerabilities.
Anthropic argues these vulnerabilities all appear relatively simple. It adds that other publicly available models are able to discover the same vulnerabilities as well, without requiring any bypass, which suggests to the company that the issue is not unique to Fable 5.
Fable's Safeguards
Anthropic describes the protections built into the model.
- ›Anthropic says it instituted strong safeguards that greatly reduce the likelihood Fable is misused for cybersecurity-related tasks.
- ›The safeguards are described as so strong that many users complained they are overly broad.
- ›Before launch, Anthropic red-teamed the safeguards for thousands of hours.
Anthropic's stated posture, as laid out in its launch blog post, is that it has instituted strong safeguards that greatly reduce the likelihood Fable is misused for tasks related to cybersecurity, among other areas. The company notes that these safeguards are so strong that many users have complained they are overly broad.
In the weeks leading up to Fable's launch, Anthropic says it worked with the U.S. government, the UK AISI, multiple private third-party organizations, and internal teams to red-team Fable's safeguards for thousands of hours in total. The company says these tests showed Fable's safeguards are substantially more effective than those of any previously deployed model, and that no testers have yet been able to find a universal jailbreak, meaning a method that can very broadly bypass the model's safeguards and unblock a wide range of cyber capabilities.
Anthropic's Defense in Depth Strategy
The company explains its layered approach to misuse.
- ›Anthropic says perfect jailbreak resistance does not appear to be possible for any model provider today.
- ›It aimed to make jailbreaks either narrow or very expensive to produce.
- ›It combined this with thorough monitoring to quickly detect and shut down successful attacks.
Anthropic says it suspects perfect jailbreak resistance is not currently possible for any model provider. It states that every safeguard used in the industry is vulnerable to non-universal jailbreaks, which can elicit some cyber information in specific circumstances, and that universal jailbreaks will likely eventually be found. Anthropic says it stated this clearly when it released Fable 5.
Given that perfect resistance does not appear possible today, Anthropic adopted a defense in depth strategy with Fable 5, aiming to make jailbreaks either narrow or very expensive to produce, and combining this with thorough monitoring to quickly detect and shut down successful attacks. The company says this is also why it required 30-day retention of customer data with Fable, a policy change that carries real costs with customers but allows it to research and mitigate jailbreaks. Anthropic says it stands by this strategy, which it argues reduces Fable's risks to a level comparable with existing models already deployed across the industry.
Disputed Evidence
Anthropic challenges the strength of the government's case.
- ›Anthropic says it has not received a disclosure of a concerning non-universal potential jailbreak that led to a harmful result.
- ›The potential jailbreaks disclosed so far are either benign responses or minor findings with no Mythos-specific uplift.
- ›To date, the government has only given verbal evidence of a potential narrow, non-universal jailbreak.
Anthropic says it has not even received a disclosure of a concerning non-universal potential jailbreak that led to a harmful result. The potential jailbreaks disclosed to the company are, in its account, either entirely benign responses or minor findings that provide no Mythos-specific uplift.
To date, Anthropic says, the government has only given it verbal evidence of a potential narrow, non-universal jailbreak, which the company describes as essentially asking the model to read a specific codebase and fix any software flaws. Anthropic's understanding is that one potential jailbreak was shared with the government, and the company says it reviewed a report it believes is the basis of the government's directive and validated it. Anthropic disputes the government's characterization and argues the action should not have been taken.
Frequently Asked Questions
What did the U.S. government order Anthropic to do?
The government issued an export control directive, citing national security authorities, to suspend all access to Fable 5 and Mythos 5 by any foreign national, inside or outside the U.S., including Anthropic's foreign national employees.
Which models are affected?
Only Fable 5 and Mythos 5 are affected, and Anthropic had to disable them for all customers to comply. Access to all other Anthropic models is unaffected.
What is the government's concern?
Anthropic understands the concern relates to a method of jailbreaking Fable 5. Anthropic says the technique surfaced only a small number of previously known, minor vulnerabilities that other public models can also find.
How did Anthropic test Fable's safeguards?
Before launch, Anthropic red-teamed the safeguards for thousands of hours with the U.S. government, the UK AISI, multiple third-party organizations, and internal teams, and says no tester has found a universal jailbreak.
Does Anthropic agree with the directive?
No. Anthropic disputes the government's characterization, says it stands by its defense in depth strategy, and argues the vulnerabilities are minor and comparable to risks in existing deployed models.
Anthropic complied with the directive while publicly disputing the basis for it, arguing Fable 5's risks are comparable to those of models already in use across the industry.
Continue Learning
Comments
Sign in to join the conversation