Google Gemini API - Complete Guide
Version: 3.0.0 (14 Known Issues Added)
Package: @google/[email protected] (โ ๏ธ NOT @google/generative-ai)
Last Updated: 2026-01-21
โ ๏ธ CRITICAL SDK MIGRATION WARNING
DEPRECATED SDK: @google/generative-ai (sunset November 30, 2025)
CURRENT SDK: @google/genai v1.27+
If you see code using @google/generative-ai, it's outdated!
This skill uses the correct current SDK and provides a complete migration guide.
Status
โ
Phase 1 Complete:
- โ
Text Generation (basic + streaming)
- โ
Multimodal Inputs (images, video, audio, PDFs)
- โ
Function Calling (basic + parallel execution)
- โ
System Instructions & Multi-turn Chat
- โ
Thinking Mode Configuration
- โ
Generation Parameters (temperature, top-p, top-k, stop sequences)
- โ
Both Node.js SDK (@google/genai) and fetch approaches
โ
Phase 2 Complete:
- โ
Context Caching (cost optimization with TTL-based caching)
- โ
Code Execution (built-in Python interpreter and sandbox)
- โ
Grounding with Google Search (real-time web information + citations)
๐ฆ Separate Skills:
- Embeddings: See
google-gemini-embeddings skill for text-embedding-004
Table of Contents
Phase 1 - Core Features:
- Quick Start
- Current Models (2025)
- SDK vs Fetch Approaches
- Text Generation
- Streaming
- Multimodal Inputs
- Function Calling
- System Instructions
- Multi-turn Chat
- Thinking Mode
- Generation Configuration
Phase 2 - Advanced Features:
12. Context Caching
13. Code Execution
14. Grounding with Google Search
Common Reference:
15. Known Issues Prevention
16. Error Handling
17. Rate Limits
18. SDK Migration Guide
19. Production Best Practices
Quick Start
Installation
CORRECT SDK:
npm install @google/[email protected]
โ WRONG (DEPRECATED):
npm install @google/generative-ai
Environment Setup
export GEMINI_API_KEY="..."
Or create .env file:
GEMINI_API_KEY=...
First Text Generation (Node.js SDK)
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Explain quantum computing in simple terms'
});
console.log(response.text);
First Text Generation (Fetch - Cloudflare Workers)
const response = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [{ parts: [{ text: 'Explain quantum computing in simple terms' }] }]
}),
}
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
Current Models (2025)
Gemini 3 Series (December 2025)
gemini-3-flash
- Context: 1,048,576 input tokens / 65,536 output tokens
- Status: ๐ Generally Available (December 2025)
- Description: Google's fastest and most efficient Gemini 3 model for production workloads
- Best for: High-throughput applications, low-latency responses, cost-sensitive production
- Features: Enhanced multimodal, function calling, streaming, thinking mode
- Benchmark Performance: Matches gemini-2.5-pro quality at gemini-2.5-flash speed/cost
- Recommended for: Production use cases requiring speed + quality balance
gemini-3-pro-preview
- Context: TBD (documentation pending)
- Status: Preview release (November 18, 2025)
- Description: Google's newest and most intelligent AI model with state-of-the-art reasoning
- Best for: Most complex reasoning tasks, advanced multimodal understanding, benchmark-critical applications
- Features: Enhanced multimodal (text, image, video, audio, PDF), function calling, streaming
- Benchmark Performance: Outperforms Gemini 2.5 Pro on every major AI benchmark
- โ ๏ธ Preview Models Warning: Preview models have NO SLAs and can change or be deprecated with little notice. Use GA (generally available) models for production. See Issue #13
Gemini 2.5 Series (General Availability - Stable)
gemini-2.5-pro
- Context: 1,048,576 input tokens / 65,536 output tokens
- Description: State-of-the-art thinking model for complex reasoning
- Best for: Code, math, STEM, complex problem-solving
- Features: Thinking mode (default on), function calling, multimodal, streaming
- Knowledge cutoff: January 2025
gemini-2.5-flash
- Context: 1,048,576 input tokens / 65,536 output tokens
- Description: Best price-performance workhorse model
- Best for: Large-scale processing, low-latency, high-volume, agentic use cases
- Features: Thinking mode (default on), function calling, multimodal, streaming
- Knowledge cutoff: January 2025
gemini-2.5-flash-lite
- Context: 1,048,576 input tokens / 65,536 output tokens
- Description: Cost-optimized, fastest 2.5 model
- Best for: High throughput, cost-sensitive applications
- Features: Thinking mode (default on), function calling, multimodal, streaming
- Knowledge cutoff: January 2025
Model Feature Matrix
| Feature |
3-Flash |
3-Pro (Preview) |
2.5-Pro |
2.5-Flash |
2.5-Flash-Lite |
| Thinking Mode |
โ
Default ON |
TBD |
โ
Default ON |
โ
Default ON |
โ
Default ON |
| Function Calling |
โ
|
โ
|
โ
|
โ
|
โ
|
| Multimodal |
โ
Enhanced |
โ
Enhanced |
โ
|
โ
|
โ
|
| Streaming |
โ
|
โ
|
โ
|
โ
|
โ
|
| System Instructions |
โ
|
โ
|
โ
|
โ
|
โ
|
| Context Window |
1,048,576 in |
TBD |
1,048,576 in |
1,048,576 in |
1,048,576 in |
| Output Tokens |
65,536 max |
TBD |
65,536 max |
65,536 max |
65,536 max |
| Status |
GA |
Preview |
Stable |
Stable |
Stable |
โ ๏ธ Context Window Correction
ACCURATE (Gemini 2.5): Gemini 2.5 models support 1,048,576 input tokens (NOT 2M!)
OUTDATED: Only Gemini 1.5 Pro (previous generation) had 2M token context window
GEMINI 3: Context window specifications pending official documentation
Common mistake: Claiming Gemini 2.5 has 2M tokens. It doesn't. This skill prevents this error.
SDK vs Fetch Approaches
Node.js SDK (@google/genai)
Pros:
- Type-safe with TypeScript
- Easier API (simpler syntax)
- Built-in chat helpers
- Automatic SSE parsing for streaming
- Better error handling
Cons:
- Requires Node.js or compatible runtime
- Larger bundle size
- May not work in all edge runtimes
Use when: Building Node.js apps, Next.js Server Actions/Components, or any environment with Node.js compatibility
Fetch-based (Direct REST API)
Pros:
- Works in any JavaScript environment (Cloudflare Workers, Deno, Bun, browsers)
- Minimal dependencies
- Smaller bundle size
- Full control over requests
Cons:
- More verbose syntax
- Manual SSE parsing for streaming
- No built-in chat helpers
- Manual error handling
Use when: Deploying to Cloudflare Workers, browser clients, or lightweight edge runtimes
Text Generation
Basic Text Generation (SDK)
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Write a haiku about artificial intelligence'
});
console.log(response.text);
Basic Text Generation (Fetch)
const response = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [
{
parts: [
{ text: 'Write a haiku about artificial intelligence' }
]
}
]
}),
}
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
Response Structure
{
text: string,