Improved Gemini audio models for powerful voice experiences

2025-12-12

1 min read

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Google released an updated Gemini 2.5 Flash Native Audio model for live voice agents and introduced live speech translation capabilities.

•Sharper function calling: scores 71.5% on ComplexFuncBench Audio, enabling accurate real-time data retrieval without breaking conversation flow
•Robust instruction following: adherence rate improved from 84% to 90%, delivering more reliable outputs for complex developer instructions
•Smoother multi-turn conversations: better context retrieval from previous turns for more cohesive dialogues
•Live speech translation supports 70+ languages and 2000+ language pairs, preserving speaker intonation, pacing, and pitch
•Beta live translation feature now rolling out in Google Translate app on Android in US, Mexico, and India

Related Articles