The news, in brief
Google just pushed two meaningful Gemini updates that bring AI closer to everyday life and developer workflows. On December 12, 2025, Google Translate began rolling out a beta that streams real-time, natural‑sounding translations straight to any connected headphones on Android—no Pixel Buds required. The experience is powered by Gemini’s upgraded native‑audio models and supports 70+ languages at launch in the U.S., Mexico, and India, with iOS and more countries coming in 2026. Google and press reports confirm the scope, positioning, and rollout details.
A day earlier (December 11, 2025), Google introduced the Interactions API—now in public beta via the Gemini API in AI Studio—a single endpoint that unifies how developers talk to Gemini models and agents, manage state server‑side, orchestrate tools, and run long tasks in the background. The beta also exposes Google’s Gemini Deep Research agent. Google’s developer blog and the documentation outline the design and caveats.

Live headphone translation: what’s new and how it works
Gemini’s native‑audio upgrade gives Translate a new “Live translate” mode that does more than speak a translated result—it preserves tone, cadence, and emphasis so the voice in your ears sounds human, not robotic. In continuous‑listening mode, you wear headphones and hear the world in your language; in two‑way mode, you hear your preferred language while your phone speaks the other language aloud for your conversation partner. Google says this covers 70+ languages and thousands of language pairs, with auto language detection, multilingual input in a single session, and noise robustness for busy spaces. Details here.
Availability and compatibility
- Platform and regions: Android beta in the U.S., Mexico, and India as of December 12, 2025; iOS and more countries are planned for 2026. Source.
- Headphones: Works with any wired or wireless headphones connected to your Android phone; it’s not limited to Pixel Buds. Coverage, Google.
- Language quality: Text translation quality in Translate and Search also improved with Gemini, helping with idioms and slang (e.g., “stealing my thunder”). Source.
How it compares
Apple’s Live Translation routes real‑time translations to compatible AirPods when paired to an iPhone with Apple Intelligence, but at launch it was AirPods‑only and initially had EU availability restrictions that Apple later moved to lift in November 2025. See Apple’s support guide and newsroom update for the EU rollout plan (Apple IE). Google’s approach is notable for working with any headphones from day one of its Android beta and spanning 70+ languages.
At‑a‑glance
| Update | What it does | Where it’s available | Notes |
|---|---|---|---|
| Live headphone translation (Translate) | Streams real‑time speech‑to‑speech translations to your headphones, preserving tone and pacing | Android beta in U.S., Mexico, India (Dec 12, 2025) | Any headphones; 70+ languages; iOS and more countries in 2026. Google |
| Gemini native‑audio upgrade | Improves live voice agents and adds speech‑to‑speech translation capabilities | Across Google surfaces (AI Studio, Vertex AI; rolling into Gemini Live, Search Live) | Adds continuous listening, two‑way mode, style transfer, multilingual input. Details |
| Interactions API (public beta) | Unifies model/agent calls, server‑side state, background execution, tool orchestration | Gemini API in AI Studio (Dec 11, 2025) | Supports agents (Deep Research), previous_interaction_id, optional storage; beta caveats. Docs |
Under the hood: Gemini 2.5’s native audio gets practical
Google’s updated Gemini 2.5 Flash Native Audio model underpins both the conversational “feel” and the translation flow. The model’s sharper function calling and improved instruction following help it interleave tool calls and dialogue without breaking rhythm—useful when Translate needs to detect languages, resolve ambiguity, and render speech naturally mid‑conversation. Google also highlights continuous listening, automatic language detection, and robustness to ambient noise—all critical for translation you can trust in real-life settings. Technical notes.
For builders: the new Interactions API
The Interactions API is a clean, session‑centric interface built around an “Interaction” resource that records inputs, model thoughts, tool calls/results, and outputs. Compared with the older, stateless generateContent, this API adds:
- Server‑side state (optional): Reference
previous_interaction_idinstead of resending history. - Background execution: Run long agent loops with
background=truewithout holding open a client connection. - Tool and agent orchestration: Call built‑in tools, structure outputs, and mix model and agent turns (e.g., hand off to the Gemini Deep Research agent for long‑horizon tasks, then resume with a standard model). The initial built‑in agent is
deep-research-pro-preview-12-2025. - Storage controls: Interactions are stored by default (
store=true) to enable state and background jobs, with retention of 55 days (Paid Tier) or 1 day (Free Tier). You can opt out viastore=false, noting it disables background mode and stateful continuation. Docs, announcement.
TipGetting started quickly
- In AI Studio, create a new key, then call the
/v1beta/interactionsendpoint with either amodel(e.g.,gemini-2.5-flash) or anagent(e.g., Deep Research preview). - Chain turns by passing
previous_interaction_idto keep context on the server. - Stay mindful of beta caveats: some features (for example, Computer Use and Maps grounding) are not yet supported, and tool output ordering may occasionally appear before execution in logs. See “Limitations” in the docs.
Why this matters for automation and productivity
- Faster time‑to‑value: Teams can ship voice and agent experiences with fewer moving parts—Translate’s headphone beta shows what high‑quality, low‑latency speech‑to‑speech feels like when the audio stack is handled by the model rather than a patchwork of ASR + TTS services.
- Consistency across surfaces: The same native‑audio advances are landing in Gemini Live and Search Live, hinting at a consistent interaction model whether you’re chatting with an assistant, searching hands‑free, or translating on the go. Google.
- A gentic design that scales: Interactions’ server‑side state and background execution are the plumbing many production agents lacked. The beta label is real, but the building blocks are now consolidated.
Bottom line
If 2023–2024 was about proving that AI could write, 2025 is the year it starts to listen—and listen well. Google’s headphone translation lets anyone with an Android phone and earbuds break language barriers in real time, while the Interactions API gives developers a sturdier foundation for agentic apps. Together, they point toward more natural, less UI‑heavy experiences—spoken in your language, routed by an API that finally matches how modern AI systems work.
Sources
- Google: “Bringing state‑of‑the‑art Gemini translation capabilities to Google Translate” (Dec 12, 2025) — rollout, any‑headphone support, iOS timing. Link
- Google: “Improved Gemini audio models for powerful voice interactions” — native‑audio upgrades and live speech‑to‑speech translation details. Link
- Google: “Interactions API: A unified foundation for models and agents” (Dec 11, 2025). Link
- Google Developers docs: Interactions API (Beta), state, retention, limitations. Link
- The Verge: Coverage confirming support for any headphones and initial regions. Link
- TechCrunch: Rollout, language count, and context examples. Link
- Ars Technica: Expanded headphone support and Gemini‑powered quality improvements. Link
- Apple Support: Live Translation on iPhone and with AirPods (device and language availability). Link
- Apple Newsroom (IE): EU availability update for Live Translation on AirPods (Nov 4, 2025). Link