Fluid, natural voice translation with Gemini 3.5 Live Translate

Overview

Google DeepMind released Gemini 3.5 Live Translate, an audio model that turns spoken words into translated speech within a few seconds while keeping the speaker's intonation, pacing, and pitch. It detects more than 70 languages automatically and is rolling out across the Gemini Live API, the Google Translate apps, and Google Meet. In Meet, the upgrade widens supported languages from 5 to more than 70, enabling over 2,000 language combinations inside a single call.

Key Takeaways

Gemini 3.5 Live Translate produces spoken translations within a few seconds of the speaker, streaming continuously rather than waiting for each sentence to finish.
The model automatically detects more than 70 languages, so users do not pick a source language before speaking.
The translated voice keeps the original speaker's intonation, pacing, and pitch instead of sounding like a flat machine readout.
It is launching in three places at once: the Gemini Live API in public preview, the Google Translate Android and iOS apps, and Google Meet in private preview for select business customers.
Inside Google Meet, supported languages climb from 5 to more than 70, opening over 2,000 language combinations in one meeting.
Every translated audio output carries a SynthID watermark so the speech is identifiable as AI generated.

Stats & Key Facts

#70+ languages detected automatically by the model
#2,000+ language combinations supported within a single Google Meet call
#5 languages to 70+ languages: the Google Meet supported-language jump
#1 trillion words translated every month across Google products
#20 years of translation work at Google preceding this release
#10 million voice calls placed each month on early tester Grab

Near Real-Time Speech That Stays in Sync With the Speaker

The core advance is timing: the translation arrives while the person is still talking.

Gemini 3.5 Live Translate listens to a person speaking and produces a spoken translation in another language within a few seconds. Instead of waiting for each sentence to finish, it streams the translation continuously, so the output trails the speaker by only a short gap and conversations keep close to natural timing.

The design constantly balances two competing goals. Waiting longer gathers more context and improves accuracy, while responding faster keeps the translation in sync with the speaker. The model holds these in tension to deliver speech that stays both timely and accurate.

Keeping the Speaker's Own Voice and Detecting 70+ Languages

Two features set the model apart from older translation tools.

›It preserves the speaker's intonation, pacing, and pitch, so the translated voice sounds like the original person rather than a generic synthetic readout.
›It automatically detects more than 70 languages, removing the step of choosing a source language before a conversation starts.
›It is built to stay reliable in noisy and unpredictable settings, not only quiet recording conditions.

Rolling Out Across the Gemini Live API, Google Translate, and Google Meet

Google is shipping the technology in three products at the same time.

›Developers reach it through the Gemini Live API in public preview inside Google AI Studio, so they can build it into their own apps.
›Google Translate adds it to its Android and iOS apps in a global release, including a new Android listening mode that delivers translation through the earpiece without headphones.
›Google Meet receives it in private preview for select business Workspace customers starting this month.

Google Meet Goes From 5 Languages to More Than 70

The biggest single jump in coverage lands inside video meetings.

For Google Meet, the upgrade widens supported languages from 5 to more than 70. Earlier setups were limited to translating to and from English, which restricted who could join a multilingual call. The new model opens more than 2,000 language combinations within a single meeting.

For business readers, this means a meeting with participants speaking different languages no longer routes everyone through English. Each person hears others in their own language, in something close to real time, which lowers the friction of cross-border calls.

The Scale Behind the Launch

Google frames the release against the volume of translation it already handles.

›More than 1 trillion words are translated every month across Google products, the workload the new model has to serve.
›Translation work at Google started 20 years ago, and this release continues that long effort.
›Early tester Grab, a ride-hailing and delivery company, sees more than 10 million voice calls each month, a setting where fast spoken translation helps drivers and travelers understand each other.

Early Testers and the SynthID Watermark

A set of named partners is testing the model, and every output is marked as AI generated.

›Early testers and developer partners include Grab, CJ ENM, LiveKit, Agora, Fishjam, Pipecat, and Vision Agents.
›Every translated audio output is watermarked with SynthID, an imperceptible marker that lets the speech be identified as AI generated.
›The watermark is positioned as a guard against misuse, since it keeps synthetic audio traceable even after it spreads.

Frequently Asked Questions

What is Gemini 3.5 Live Translate?

It is an audio model from Google DeepMind that listens to a person speaking and produces a spoken translation in another language within a few seconds. It keeps the speaker's intonation, pacing, and pitch so the translated voice resembles the original person.

How many languages does it support?

The model automatically detects more than 70 languages. Inside Google Meet, support rises from 5 languages to more than 70, enabling over 2,000 language combinations in a single call.

Where can people use it?

It is rolling out in three places: the Gemini Live API in public preview through Google AI Studio for developers, the Google Translate Android and iOS apps in a global release, and Google Meet in private preview for select business Workspace customers.

Is the translated audio labeled as AI generated?

Yes. Every translated output carries a SynthID watermark, an imperceptible marker that lets the speech be identified as AI generated and helps guard against misuse.

How fast is the translation?

The system streams continuously and stays only a few seconds behind the speaker, rather than waiting for each sentence to finish. It balances gathering enough context for accuracy against responding fast enough to keep pace with the conversation.

Gemini 3.5 Live Translate moves spoken translation closer to a natural conversation by staying in sync with the speaker and keeping the speaker's own voice. With rollouts across developer tools, the Translate apps, and Google Meet, it widens who can join a multilingual conversation without routing everyone through English.