Gemini Omni Video Model emerges in early Gemini app tests: remix videos, edit in chat, and generate impressive samples ahead of Google I/O 2026
Google's unreleased Gemini Omni video model has been spotted in early Gemini app tests on May 12, 2026, allowing users to remix videos, edit directly in chat, and generate impressive samples from simple prompts. Early feedback praises math coherence, voice quality, and editing features, with samples showing suited men dining oceanside with shifting camera angles. Tied to high usage limits, the model hints at a major upgrade ahead of Google I/O on May 19-20.
On May 12, 2026, just days before Google I/O 2026 (scheduled for May 19-20), early evidence of Google's unreleased Gemini Omni video model has surfaced in the Gemini mobile app.
According to reports and leaked samples, Gemini Omni allows users to remix videos, edit directly in chat, and generate impressive video samples from simple text prompts—with early feedback praising prompt adherence, motion quality, voice coherence, and editing capabilities like object swaps.
The leak hints at a major video generation upgrade that potentially unifies video creation with Gemini's reasoning capabilities, positioning Google to compete directly with Runway Gen-3, Pika, Kling, and other frontier video models.
This article breaks down what early testers are seeing, how Gemini Omni compares to existing models, what the leak tells us about Google I/O announcements, and what developers and creators should watch for.
What early testers are seeing in Gemini Omni
According to reports from @testingcatalog, @chatgpt21, and @chetaslua, the unreleased Gemini Omni model appeared in the Gemini mobile app with the following description:
"Meet our new video model. Remix your videos, edit directly in chat, try a template, and more."
Sample generation: suited men dining oceanside
One of the shared samples shows:
Suited men dining at a table next to the ocean
Shifting camera angles (multiple perspectives within a single generation)
Clinking glasses (fine-grained motion and audio coherence)
Oceanside environment with realistic lighting and atmosphere
"I won't lie, this is one of the best video models I have seen, maybe not the best, but a really strong performance. I was particularly impressed by the prompt adherence (except for the one shot with the missing centerpiece), the model's ability to swap objects in anime clips, and the overall motion quality."
This suggests Gemini Omni can:
Remix existing videos — not just generate from scratch
Edit in chat — conversational video editing workflows
Swap objects — targeted edits without full regeneration
Maintain coherence across edits
How Gemini Omni compares to other video models
Based on early feedback, here's how Gemini Omni is being positioned relative to other frontier video generation models:
Model
Strengths (per early reports)
Weaknesses (per early reports)
Gemini Omni
Strong prompt adherence, smooth motion, editing in chat, object swaps, math coherence (complex scenes), voice quality
Minor motion glitches, some missing elements (e.g., "missing centerpiece" in one shot)
Runway Gen-3
High visual quality, cinematic feel
Less conversational editing, no chat interface
Pika 2.0
Fast generation, good for short clips
Less prompt adherence for complex scenes
Kling (Kuaishou)
Strong motion dynamics, longer videos
Less accessible (China-focused rollout)
OpenAI Sora
Impressive samples, strong physics
Not publicly available
Luma Dream Machine
Fast, accessible, good quality
Less control over editing
Net: Early testers are placing Gemini Omni in the top tier of publicly accessible video models, with slightly better performance than Runway Gen-3 on prompt adherence and editing capabilities.
What "Omni" likely means: multimodal unification
The "Omni" branding is significant—it suggests Google is positioning this as a unified multimodal model that handles text, image, video, and voice in an integrated way, similar to:
OpenAI GPT-4o ("o" for omni) — unified text, image, and voice
Google Gemini 1.5 Pro — long-context multimodal reasoning
Anthropic Claude 4.7 with vision — multimodal but not video generation
What "Omni" likely means for Gemini:
Video generation is not a separate model — it's integrated into the core Gemini architecture
Edit directly in chat — leverage Gemini's conversational reasoning to guide video edits
Cross-modal reasoning — e.g., "show me the key moment from this video and explain what happened"
Unified API — developers can generate, edit, and analyze videos within the same Gemini API call
This is a structural bet on video as a first-class modality in LLMs, not a bolt-on feature.
Usage limits and pricing hints
@testingcatalog noted that Gemini Omni samples are tied to high usage limits in the Gemini app, suggesting:
Premium feature — likely part of Gemini Advanced or a new "Gemini Ultra" tier
Compute-intensive — video generation is expensive, so high limits indicate Google is targeting enterprise and creator use cases
Not free-tier — unlike Gemini 1.5 Flash, which is broadly accessible, Omni will likely require paid access
For comparison:
Runway Gen-3 costs ~$0.10-0.20 per second of video
Pika offers limited free generations, then paid plans
Luma Dream Machine has free and paid tiers
Google may follow a similar model, or bundle Gemini Omni into Google One AI Premium (rumored to launch at Google I/O).
What this leak tells us about Google I/O 2026
The timing of this leak—one week before Google I/O 2026—is almost certainly intentional marketing, similar to:
OpenAI's GPT-4o leak before the Spring Update event
Anthropic's Opus 4.7 teasers before official launch
Meta's Llama 3.1 preview ahead of Connect
What we can expect at Google I/O (May 19-20):
Official Gemini Omni announcement — likely part of the keynote
API access — developers will get access through Vertex AI and Google AI Studio
Pricing details — per-second or per-generation costs
Integration with Google Workspace — e.g., generate videos in Google Slides, Docs with video narration
Gemini Advanced or Gemini Ultra tier — bundling Omni with other premium features
Developer tools — templates, editing workflows, conversational video editing APIs
Related announcements likely include:
Gemini 2.0 (next-generation reasoning model)
Gemini Code (competitor to GitHub Copilot and Claude Code)
Gemini for Enterprise (security, compliance, on-prem options)
Google Cloud AI Platform updates (Vertex AI, TPU v7, etc.)
Practical use cases for Gemini Omni
If Gemini Omni delivers on the early samples, here are the most compelling use cases:
1. Marketing and social media
Prompt: "Generate a 15-second video of our product being used in a coffee shop,
modern aesthetic, shot from multiple angles"
No need for stock footage or expensive shoots
Edit directly in chat to tweak lighting, angles, or pacing
2. Educational content
Prompt: "Show a video of how photosynthesis works, with zooming into a leaf cell,
chloroplasts visible, narrated explanation"
Complex scientific concepts visualized instantly
Math coherence ensures accurate representations
3. Product demos
Prompt: "Create a video demo of our app's onboarding flow,
showing a user's hand tapping through screens"
Rapid prototyping without filming or screen recording
Iterate in chat to adjust timing or UI elements
4. Creative storytelling
Prompt: "Generate a scene where a detective enters a rainy noir-style alley,
camera pans from above, neon signs reflecting in puddles"
Cinematic quality for indie filmmakers and creators
Object swaps to change props, characters, or settings
5. A/B testing video ads
Prompt: "Create three variations of this ad with different color grading and pacing"
Rapid iteration for performance marketing
Test creative hypotheses before expensive production
How developers should prepare for Gemini Omni
If Google announces Gemini Omni at I/O 2026, here's how to prepare:
1. Explore the API early
Sign up for Google AI Studio or Vertex AI access
Test prompt engineering patterns for video generation
Benchmark costs against Runway, Pika, and other tools
2. Build conversational video editing workflows
Leverage Gemini's chat interface for iterative edits
Design workflows where users can refine videos in natural language
Integrate with existing tools (e.g., video editing in Figma, Notion, Slack)
3. Combine with other Gemini capabilities
Long-context reasoning — generate videos from entire documents or transcripts
Multimodal search — find moments in videos and remix them
Voice integration — narrate videos using Gemini's voice synthesis
4. Monitor pricing and usage limits
Video generation is expensive—track costs carefully
Consider hybrid workflows (Gemini for generation, cheaper models for iteration)
Evaluate Google One AI Premium if bundled access is cheaper
How Gemini Omni fits into Google's AI strategy
Gemini Omni is part of a broader push to make Gemini the default multimodal foundation for Google's ecosystem:
Product
Gemini integration
Video capability
Gemini Advanced
Core reasoning model
Likely gets Omni access
Google Workspace
Docs, Sheets, Slides, Gmail
Generate videos in Slides, narrate Docs
YouTube
Video understanding, summaries
Remix and edit YouTube videos directly
Google Cloud
Vertex AI, Gemini API
Enterprise video generation at scale
Android
On-device Gemini Nano
Local video editing on Pixel devices
Google Search
AI Overviews
Generate video explainers in search results
This is a platform play—Google wants video generation to be as accessible as text generation across all its products.
Gemini Omni is Google's unreleased video generation model that has surfaced in early Gemini app tests just days before Google I/O 2026. Early samples show strong prompt adherence, smooth motion, editing in chat, and object swaps—with testers placing it slightly above Runway Gen-3 in quality.
The "Omni" branding suggests Google is unifying video generation with Gemini's reasoning capabilities, making video a first-class modality across its ecosystem. The model is tied to high usage limits, indicating it will likely be a premium feature in Gemini Advanced or a new Gemini Ultra tier.
If Google announces Gemini Omni at I/O 2026 on May 19-20, expect:
API access through Vertex AI and Google AI Studio
Pricing details (likely per-second or per-generation)
Integration with Google Workspace (Slides, Docs, YouTube)
Developer tools for conversational video editing
For creators, marketers, and developers, Gemini Omni represents a step-function improvement in accessible, high-quality video generation—with chat-based editing as a potential killer feature that distinguishes it from Runway, Pika, and other tools.
Watch Google I/O 2026 keynote on May 19 for the official announcement.