On May 12, 2026, just days before Google I/O 2026 (scheduled for May 19-20), early evidence of Google's unreleased Gemini Omni video model has surfaced in the Gemini mobile app.
According to reports and leaked samples, Gemini Omni allows users to remix videos, edit directly in chat, and generate impressive video samples from simple text prompts—with early feedback praising prompt adherence, motion quality, voice coherence, and editing capabilities like object swaps.
The leak hints at a major video generation upgrade that potentially unifies video creation with Gemini's reasoning capabilities, positioning Google to compete directly with Runway Gen-3, Pika, Kling, and other frontier video models.
This article breaks down what early testers are seeing, how Gemini Omni compares to existing models, what the leak tells us about Google I/O announcements, and what developers and creators should watch for.
What early testers are seeing in Gemini Omni
According to reports from @testingcatalog, @chatgpt21, and @chetaslua, the unreleased Gemini Omni model appeared in the Gemini mobile app with the following description:
"Meet our new video model. Remix your videos, edit directly in chat, try a template, and more."
Sample generation: suited men dining oceanside
One of the shared samples shows:
- Suited men dining at a table next to the ocean
- Shifting camera angles (multiple perspectives within a single generation)
- Clinking glasses (fine-grained motion and audio coherence)
- Oceanside environment with realistic lighting and atmosphere
@chatgpt21 commented:
"Google is COOKING 🧑🏻🍳. Early Sample Of The New Omni Video Model: (I'd put this slightly above Runway Gen-3)"
Editing capabilities: object swaps in anime clips
@testingcatalog noted:
"I won't lie, this is one of the best video models I have seen, maybe not the best, but a really strong performance. I was particularly impressed by the prompt adherence (except for the one shot with the missing centerpiece), the model's ability to swap objects in anime clips, and the overall motion quality."
This suggests Gemini Omni can:
- Remix existing videos — not just generate from scratch
- Edit in chat — conversational video editing workflows
- Swap objects — targeted edits without full regeneration
- Maintain coherence across edits
How Gemini Omni compares to other video models
Based on early feedback, here's how Gemini Omni is being positioned relative to other frontier video generation models:
| Model | Strengths (per early reports) | Weaknesses (per early reports) |
|---|---|---|
| Gemini Omni | Strong prompt adherence, smooth motion, editing in chat, object swaps, math coherence (complex scenes), voice quality | Minor motion glitches, some missing elements (e.g., "missing centerpiece" in one shot) |
| Runway Gen-3 | High visual quality, cinematic feel | Less conversational editing, no chat interface |
| Pika 2.0 | Fast generation, good for short clips | Less prompt adherence for complex scenes |
| Kling (Kuaishou) | Strong motion dynamics, longer videos | Less accessible (China-focused rollout) |
| OpenAI Sora | Impressive samples, strong physics | Not publicly available |
| Luma Dream Machine | Fast, accessible, good quality | Less control over editing |
Net: Early testers are placing Gemini Omni in the top tier of publicly accessible video models, with slightly better performance than Runway Gen-3 on prompt adherence and editing capabilities.
What "Omni" likely means: multimodal unification
The "Omni" branding is significant—it suggests Google is positioning this as a unified multimodal model that handles text, image, video, and voice in an integrated way, similar to:
- OpenAI GPT-4o ("o" for omni) — unified text, image, and voice
- Google Gemini 1.5 Pro — long-context multimodal reasoning
- Anthropic Claude 4.7 with vision — multimodal but not video generation
What "Omni" likely means for Gemini:
- Video generation is not a separate model — it's integrated into the core Gemini architecture
- Edit directly in chat — leverage Gemini's conversational reasoning to guide video edits
- Cross-modal reasoning — e.g., "show me the key moment from this video and explain what happened"
- Unified API — developers can generate, edit, and analyze videos within the same Gemini API call
This is a structural bet on video as a first-class modality in LLMs, not a bolt-on feature.
Usage limits and pricing hints
@testingcatalog noted that Gemini Omni samples are tied to high usage limits in the Gemini app, suggesting:
- Premium feature — likely part of Gemini Advanced or a new "Gemini Ultra" tier
- Compute-intensive — video generation is expensive, so high limits indicate Google is targeting enterprise and creator use cases
- Not free-tier — unlike Gemini 1.5 Flash, which is broadly accessible, Omni will likely require paid access
For comparison:
- Runway Gen-3 costs ~$0.10-0.20 per second of video
- Pika offers limited free generations, then paid plans
- Luma Dream Machine has free and paid tiers
Google may follow a similar model, or bundle Gemini Omni into Google One AI Premium (rumored to launch at Google I/O).
What this leak tells us about Google I/O 2026
The timing of this leak—one week before Google I/O 2026—is almost certainly intentional marketing, similar to:
- OpenAI's GPT-4o leak before the Spring Update event
- Anthropic's Opus 4.7 teasers before official launch
- Meta's Llama 3.1 preview ahead of Connect
What we can expect at Google I/O (May 19-20):
- Official Gemini Omni announcement — likely part of the keynote
- API access — developers will get access through Vertex AI and Google AI Studio
- Pricing details — per-second or per-generation costs
- Integration with Google Workspace — e.g., generate videos in Google Slides, Docs with video narration
- Gemini Advanced or Gemini Ultra tier — bundling Omni with other premium features
- Developer tools — templates, editing workflows, conversational video editing APIs
Related announcements likely include:
- Gemini 2.0 (next-generation reasoning model)
- Gemini Code (competitor to GitHub Copilot and Claude Code)
- Gemini for Enterprise (security, compliance, on-prem options)
- Google Cloud AI Platform updates (Vertex AI, TPU v7, etc.)
Practical use cases for Gemini Omni
If Gemini Omni delivers on the early samples, here are the most compelling use cases:
1. Marketing and social media
Prompt: "Generate a 15-second video of our product being used in a coffee shop,
modern aesthetic, shot from multiple angles"
- No need for stock footage or expensive shoots
- Edit directly in chat to tweak lighting, angles, or pacing
2. Educational content
Prompt: "Show a video of how photosynthesis works, with zooming into a leaf cell,
chloroplasts visible, narrated explanation"
- Complex scientific concepts visualized instantly
- Math coherence ensures accurate representations
3. Product demos
Prompt: "Create a video demo of our app's onboarding flow,
showing a user's hand tapping through screens"
- Rapid prototyping without filming or screen recording
- Iterate in chat to adjust timing or UI elements
4. Creative storytelling
Prompt: "Generate a scene where a detective enters a rainy noir-style alley,
camera pans from above, neon signs reflecting in puddles"
- Cinematic quality for indie filmmakers and creators
- Object swaps to change props, characters, or settings
5. A/B testing video ads
Prompt: "Create three variations of this ad with different color grading and pacing"
- Rapid iteration for performance marketing
- Test creative hypotheses before expensive production
How developers should prepare for Gemini Omni
If Google announces Gemini Omni at I/O 2026, here's how to prepare:
1. Explore the API early
- Sign up for Google AI Studio or Vertex AI access
- Test prompt engineering patterns for video generation
- Benchmark costs against Runway, Pika, and other tools
2. Build conversational video editing workflows
- Leverage Gemini's chat interface for iterative edits
- Design workflows where users can refine videos in natural language
- Integrate with existing tools (e.g., video editing in Figma, Notion, Slack)
3. Combine with other Gemini capabilities
- Long-context reasoning — generate videos from entire documents or transcripts
- Multimodal search — find moments in videos and remix them
- Voice integration — narrate videos using Gemini's voice synthesis
4. Monitor pricing and usage limits
- Video generation is expensive—track costs carefully
- Consider hybrid workflows (Gemini for generation, cheaper models for iteration)
- Evaluate Google One AI Premium if bundled access is cheaper
How Gemini Omni fits into Google's AI strategy
Gemini Omni is part of a broader push to make Gemini the default multimodal foundation for Google's ecosystem:
| Product | Gemini integration | Video capability |
|---|---|---|
| Gemini Advanced | Core reasoning model | Likely gets Omni access |
| Google Workspace | Docs, Sheets, Slides, Gmail | Generate videos in Slides, narrate Docs |
| YouTube | Video understanding, summaries | Remix and edit YouTube videos directly |
| Google Cloud | Vertex AI, Gemini API | Enterprise video generation at scale |
| Android | On-device Gemini Nano | Local video editing on Pixel devices |
| Google Search | AI Overviews | Generate video explainers in search results |
This is a platform play—Google wants video generation to be as accessible as text generation across all its products.
Related on ExplainX
- OpenAI Daybreak: Codex Security for cyber defense — OpenAI's May 12, 2026 announcement on the same day as Gemini Omni leak
- Runway Characters: real-time video agents GWM-1 — Runway's video generation platform
- How diffusion image generation works — technical foundations of generative models
- Google Cloud Next 2026: TPU 8, Gemini Enterprise, Agent Platform — Google's AI infrastructure strategy
Bottom line
Gemini Omni is Google's unreleased video generation model that has surfaced in early Gemini app tests just days before Google I/O 2026. Early samples show strong prompt adherence, smooth motion, editing in chat, and object swaps—with testers placing it slightly above Runway Gen-3 in quality.
The "Omni" branding suggests Google is unifying video generation with Gemini's reasoning capabilities, making video a first-class modality across its ecosystem. The model is tied to high usage limits, indicating it will likely be a premium feature in Gemini Advanced or a new Gemini Ultra tier.
If Google announces Gemini Omni at I/O 2026 on May 19-20, expect:
- API access through Vertex AI and Google AI Studio
- Pricing details (likely per-second or per-generation)
- Integration with Google Workspace (Slides, Docs, YouTube)
- Developer tools for conversational video editing
For creators, marketers, and developers, Gemini Omni represents a step-function improvement in accessible, high-quality video generation—with chat-based editing as a potential killer feature that distinguishes it from Runway, Pika, and other tools.
Watch Google I/O 2026 keynote on May 19 for the official announcement.
Early reports via X: @testingcatalog, @chatgpt21, @chetaslua. Google I/O 2026: May 19-20. ExplainX is not affiliated with Google.