Voice AI Systems Are Vulnerable to Hidden Audio Attacks

Overview

New research to be presented at the IEEE Symposium on Security and Privacy shows that voice AI systems can be hijacked by sounds hidden in audio that are imperceptible to human ears. The technique, called AudioHijack, manipulates large audio-language models into executing unauthorized commands such as web searches, file downloads and sending emails with user data. The attack works regardless of the user's spoken instructions and can be reused against the same model.

Key Takeaways

Researchers showed that an audio clip undetectable to human ears can manipulate a voice AI model's behavior.
The technique, called AudioHijack, exploits the fact that large audio-language models accept instructions in audio form.
The attack achieved an average success rate of 79 to 96 percent across tested models.
It is context-agnostic, working regardless of the user's spoken instructions, and can be reused to attack the same model repeatedly.
The team tested 13 leading open models, including commercial AI voice services from Microsoft and Mistral.

Stats & Key Facts

#Average success rate of 79 to 96 percent
#Tested against 13 leading open models
#It takes about half an hour to train the malicious signal

Voice AI Systems Are Vulnerable to Hidden Audio Attacks

What the attack does

The research targets generative audio models that can take actions.

›A modified clip undetectable to human ears manipulates model behavior.
›It can coax models into sensitive web searches and downloading files from attacker-controlled sources.
›It can make models send emails containing user data.

The clips are designed to work regardless of what instructions the user provides alongside the audio, so they can be reused to attack the same model multiple times. The work is due to be presented at the IEEE Symposium on Security and Privacy in San Francisco.

How AudioHijack works

›It exploits that large audio-language models can receive instructions in audio format.
›Malicious instructions are hidden in manipulated clips.
›The attacker manipulates only the audio data being processed, not the user's instructions.

Lead author Meng Chen, a Ph.D. student at Zhejiang University in China, said it takes just half an hour to train the signal, and because the signal is context-agnostic it can be used to attack the target model whenever the attacker wants, no matter what the user says.

How it differs from prior work

›The research builds on years of work into adversarial audio examples.
›Previous work focused on inducing incorrect predictions in one-way tasks like speech recognition.
›This work targets generative models that produce responses and take actions.

Many previous attacks required the attacker to control both the final audio input and the original instructions, essentially acting as the user. Here, the attacker manipulates only the audio data being processed, which makes it possible to attack a model while it is being used by someone else.

The technique

›The method adjusts the numerical values representing the audio waveform.
›Changes do not significantly alter how the audio sounds.
›An optimization algorithm repeatedly tweaks the clip and measures the impact on the model.

Applying this to generative models is harder because those models break audio into chunks assigned to numerical representations called tokens, rather than providing fine-grained feedback on tiny changes to raw audio.

Real-world scenarios

›Hiding malicious instructions in online videos, music clips or voice notes that users query an AI about.
›Broadcasting malicious audio on a Zoom call later uploaded to AI transcription services.
›Injecting malicious audio into a live voice chat with an AI in real time.

Chen said the team's more recent, unpublished studies demonstrated the ability to inject malicious audio into a live voice chat with an AI in real time.

Frequently Asked Questions

What is AudioHijack?

It is a technique that hides malicious instructions in audio clips imperceptible to human ears to make large audio-language models perform unauthorized actions.

How effective is the attack?

The researchers report an average success rate of 79 to 96 percent across the models they tested.

What models were tested?

The team tested 13 leading open models, including commercial AI voice services from Microsoft and Mistral.

Why is this attack hard to defend against?

The clips sound normal to humans, are context-agnostic so they work regardless of the user's spoken instructions, and can be reused against the same model.

What harmful actions can result?

Researchers showed models could be made to conduct sensitive web searches, download files from attacker-controlled sources, and send emails containing user data.

The findings highlight a security flaw in audio-language models that accept spoken instructions, since hidden audio can drive them to act without a user's knowledge.