What is Grok Imagine Video 1.5?

Grok Imagine Video 1.5 is xAI's latest AI image-to-video model, released on June 17, 2026. It generates 720p video at 24FPS from text prompts or still images, with native synchronized audio generated in a single pass. It currently ranks #1 on the Image-to-Video Arena leaderboard with a +52 Elo improvement over version 1.0.

How does Grok Imagine Video 1.5 compare to Sora 2 and Veo 3.1?

In blind user testing on the Image-to-Video Arena, Grok Imagine 1.5 outranks Sora 2, Veo 3.1, Seedance 2.0, and Kling. It also costs significantly less — $4.20 per minute versus Sora 2 Pro at $30/min (86% cheaper) and Veo 3.1 at $12/min (65% cheaper).

Does Grok Imagine Video 1.5 generate audio automatically?

Yes. Grok Imagine Video 1.5 generates synchronized sound effects and background audio alongside the video in a single pass, including improved lip-sync accuracy. No separate audio generation step is required.

What resolution does Grok Imagine Video 1.5 output?

The model outputs 480p or 720p video at 24FPS. The 720p resolution cap is its main technical limitation for professional productions requiring 1080p or higher output.

How long are the videos generated by Grok Imagine Video 1.5?

Base clips run from 1 to 15 seconds. The Extend from Frame feature adds 6 to 10 seconds per extension, allowing longer sequences to be built incrementally.

What animation styles does Grok Imagine Video 1.5 support?

The model supports four animation style modes — Normal, Fun, Custom, and Spicy — which define the overall tone and energy of the output. Camera movements and pacing can be directed via natural-language prompts.

Where can I use Grok Imagine Video 1.5?

Grok Imagine Video 1.5 is available at grok.com/imagine. Elon Musk confirmed wide release on June 17, 2026. An API is also available for developers building video generation into their own applications.

Grok Imagine Video 1.5: xAI Launches #1 AI Video Generator (June 2026) | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Grok Imagine Video 1.5: xAI Launches #1 AI Video Generator (June 2026) | explainx.ai Blog | explainx.ai

At 9:25 AM on June 17, 2026, Elon Musk posted two words — "wide release" — and dropped a link to grok.com/imagine. Twenty minutes earlier, xAI had published a thread announcing Grok Imagine Video 1.5: a new image-to-video model with native synchronized audio, sharper physics, and faster generation times. By the afternoon, it had accumulated 268,000+ views and was being used to generate everything from cinematic clips to creative experiments.

The broader context: Grok Imagine Video 1.5 is not just an incremental update to xAI's video tooling. It is the model that currently sits at #1 on the Image-to-Video Arena leaderboard — beating Sora 2, Veo 3.1, Seedance 2.0, and Kling in blind user testing — at a price point that undercuts every major competitor by a wide margin.

What Grok Imagine Video 1.5 Actually Does

At its core, Grok Imagine Video 1.5 takes an input — a text prompt, a still image, or both — and produces a video clip with synchronized audio. The mechanics:

Resolution: 480p or 720p output
Frame rate: 24FPS
Clip length: 1 to 15 seconds base; 6–10 second extensions via Extend from Frame
Audio: Native synchronized sound effects, background audio, and lip-sync generated in a single pass — no separate step
Animation modes: Normal, Fun, Custom, or Spicy to set the overall tone

The underlying architecture is Aurora, xAI's autoregressive video generation model. The autoregressive approach is what gives it character consistency across frames — faces do not warp between cuts, and camera movements (pans, dolly moves, tracking shots) execute cleanly without the stuttering common in earlier diffusion-based video models.

The key additions in 1.5 over 1.0:

Sharper realism: Physics simulation is more accurate — fabric moves naturally, liquids behave with weight, lighting changes are consistent across camera transitions
Better lip-sync: Accuracy in matching spoken audio to mouth movements improved significantly
Faster generation: Generation speed is faster than 1.0, though specific throughput numbers have not been published
+52 Elo on the arena: The jump from 1.0 to 1.5 is the largest single-version improvement in the benchmark's history for any model in the image-to-video category

Benchmark Position: How It Ranks Against the Competition

The Image-to-Video Arena is the current industry standard for comparing AI video generators — it uses blind user voting (users see two outputs without knowing which model generated which and pick the better one). As of the 1.5 release:

Model	Arena Elo	Relative Position
Grok Imagine Video 1.5 (720p)	1,473	#1
Seedance 2.0	Below 1.5	#2
HappyHorse 1.0	Below Seedance	#3
Google Veo 3.1	Below 1.5	Top 5
Sora 2	Below 1.5	Top 5
Kling 3.0	Below 1.5	Top 5

Model	Price per Minute of Video
Grok Imagine Video 1.5	$4.20
Veo 3.1	$12.00
Sora 2 Pro	$30.00

Grok Imagine Video 1.5 Is Here: xAI's #1 Image-to-Video Model with Native Audio (2026)

What Grok Imagine Video 1.5 Actually Does

Benchmark Position: How It Ranks Against the Competition

Related posts

xAI's Grok models land on Hugging Face: 43.2k downloads, 1.08k stars, open weights for Grok-1 and Grok-2

Musk vs Altman Scammer Feud: Space Data Centers, OpenAI History, and July 2026 Blowup

Grok 4.5 in Cursor: SpaceXAI MoE Model — Benchmarks, Pricing, Cyber Guards

The Pricing Gap Is the Real Story

Native Audio: Why It Matters

Animation Modes and Camera Control

Where It Still Falls Short

Who Should Use It

How to Access It

The Bigger Picture

FAQ

Is Grok Imagine Video 1.5 free?

Can I use Grok Imagine Video 1.5 via API?

Does it work with any image as input?

How does the Extend from Frame feature work?