Gemini Live API Development Skill
Overview
The Live API enables low-latency, real-time voice and video interactions with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.
Key capabilities:
- Bidirectional audio streaming β real-time mic-to-speaker conversations
- Video streaming β send camera/screen frames alongside audio
- Text input/output β send and receive text within a live session
- Audio transcriptions β get text transcripts of both input and output audio
- Voice Activity Detection (VAD) β automatic interruption handling
- Native audio β thinking (with configurable
thinkingLevel)
- Function calling β synchronous tool use
- Google Search grounding β ground responses in real-time search results
- Session management β context compression, session resumption, GoAway signals
- Ephemeral tokens β secure client-side authentication
[!NOTE]
The Live API currently only supports WebSockets. For WebRTC support or simplified integration, use a partner integration.
Models
gemini-3.1-flash-live-preview β Optimized for low-latency, real-time dialogue. Native audio output, thinking (via thinkingLevel). 128k context window. This is the recommended model for all Live API use cases.
[!WARNING]
The following Live API models are deprecated and will be shut down. Migrate to gemini-3.1-flash-live-preview.
gemini-2.5-flash-native-audio-preview-12-2025 β Migrate to gemini-3.1-flash-live-preview.
gemini-live-2.5-flash-preview β Released June 17, 2025. Shutdown: December 9, 2025.
gemini-2.0-flash-live-001 β Released April 9, 2025. Shutdown: December 9, 2025.
SDKs
- Python:
google-genai β pip install google-genai
- JavaScript/TypeScript:
@google/genai β npm install @google/genai
[!WARNING]
Legacy SDKs google-generativeai (Python) and @google/generative-ai (JS) are deprecated. Use the new SDKs above.
Partner Integrations
To streamline real-time audio/video app development, use a third-party integration supporting the Gemini Live API over WebRTC or WebSockets:
- LiveKit β Use the Gemini Live API with LiveKit Agents.
- Pipecat by Daily β Create a real-time AI chatbot using Gemini Live and Pipecat.
- Fishjam by Software Mansion β Create live video and audio streaming applications with Fishjam.
- Vision Agents by Stream β Build real-time voice and video AI applications with Vision Agents.
- Voximplant β Connect inbound and outbound calls to Live API with Voximplant.
- Firebase AI SDK β Get started with the Gemini Live API using Firebase AI Logic.
Audio Formats
- Input: Raw PCM, little-endian, 16-bit, mono. 16kHz native (will resample others). MIME type:
audio/pcm;rate=16000
- Output: Raw PCM, little-endian, 16-bit, mono. 24kHz sample rate.
[!IMPORTANT]
Use send_realtime_input / sendRealtimeInput for all real-time user input (audio, video, and text). send_client_content / sendClientContent is only supported for seeding initial context history (requires setting initial_history_in_client_content in history_config). Do not use it to send new user messages during the conversation.
[!WARNING]
Do not use media in sendRealtimeInput. Use the specific keys: audio for audio data, video for images/video frames, and text for text input.
Quick Start
Authentication
Python
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")
JavaScript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' });
Connecting to the Live API
Python
from google.genai import types
config = types.LiveConnectConfig(
response_modalities=[types.Modality.AUDIO],
system_instruction=types.Content(
parts=[types.Part(text="You are a helpful assistant.")]
)
)
async with client.aio.live.connect(model="gemini-3.1-flash-live-preview", config=config) as session:
pass
JavaScript
const session = await ai.live.connect({
model: 'gemini-3.1-flash-live-preview',
config: {
responseModalities: ['audio'],
systemInstruction: { parts: [{ text: 'You are a helpful assistant.' }] }
},
callbacks: {
onopen: () => console.log('Connected'),
onmessage: (response) => console.log('Message:', response),
onerror: (error) => console.error('Error:', error),
onclose: () => console.log('Closed')
}
});
Sending Text
Python
await session.send_realtime_input(text="Hello, how are you?")
JavaScript
session.sendRealtimeInput({ text: 'Hello, how are you?' });
Sending Audio
Python
await session.send_realtime_input(
audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")
)
JavaScript
session.sendRealtimeInput({
audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' }
});
Sending Video
Python
await session.send_realtime_input(
video=types.Blob(data=frame, mime_type="image/jpeg")
)
JavaScript
session.sendRealtimeInput({
video: { data: frame.toString('base64'), mimeType: 'image/jpeg' }
});
Receiving Audio and Text
[!IMPORTANT]
A single server event can contain multiple content parts simultaneously (e.g., audio chunks and transcript). Always process all parts in each event to avoid missing content.
Python
async for response in session.receive():
content = response.server_content
if content:
if content.model_turn:
for part in content.model_turn.parts:
if part.inline_data:
audio_data = part.inline_data.data
if content.input_transcription:
print(f"User: {content.input_transcription.text}")
if content.output_transcription:
print(f"Gemini: {content.output_transcription.text}")
if content.interrupted is True:
pass
JavaScript
const content = response.serverContent;
if (content?.modelTurn?.parts) {
for <