Grok 4.5 is xAI's latest model, built on the 1.5T V9 foundation model with Cursor IDE coding data added during supplemental training. It entered private beta at SpaceX and Tesla on June 28, 2026, with early evaluations showing performance close to or exceeding Anthropic's Claude Opus.

What is the Grok Build harness?

The Grok Build harness is xAI's internal training and evaluation infrastructure that Elon Musk referenced as continuing to show "daily advancements." It is likely xAI's agentic coding and evaluation pipeline, similar in concept to how frontier labs test models on software engineering and tool-use benchmarks before public release.

Why was Cursor data used in Grok 4.5 training?

Cursor is a widely used AI coding IDE built on top of frontier models. Training on Cursor interaction data provides a rich signal of how developers actually use AI to write, debug, and review code — real-world agentic software engineering tasks that synthetic benchmarks often miss.

How does Grok 4.5 compare to Claude Opus?

According to Elon Musk's announcement, early evaluations show Grok 4.5 performance "close to, perhaps exceeding" Claude Opus. These are internal evaluations and have not yet been independently verified on public benchmarks. Independent developers who tested an early build described the vibes as "similar to Opus."

When will Grok 4.5 be publicly available?

Elon Musk announced on July 8, 2026 that SpaceXAI will release Grok 4.5 to the public on July 9, 2026. See the full launch guide at /blog/grok-4-5-public-launch-spacexai-july-2026. Before that, it was limited to private beta at SpaceX and Tesla from June 28, 2026.

Grok 4.5 Enters Private Beta at SpaceX and Tesla — Built | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Grok 4.5 Enters Private Beta at SpaceX and Tesla — Built | explainx.ai Blog | explainx.ai

TL;DR

Update — July 9, 2026: Grok 4.5 live in Cursor with published benchmarks and pricing. Cursor launch guide →

Update — July 8, 2026: Grok 4.5 goes public July 9 — Musk says Opus-class, faster, cheaper. Full launch coverage: Grok 4.5 public launch.

Update — June 30, 2026: The day after Grok 4.5's beta, Cursor shipped Cursor for iOS — not the from-scratch Composer model. See Cursor for iOS launch (June 29).

On June 28, 2026, Elon Musk announced that Grok 4.5 — built on xAI's 1.5T V9 foundation model with Cursor IDE coding data added in supplemental training — has entered private beta at SpaceX and Tesla. Early internal evaluations show performance "close to, perhaps exceeding" Anthropic's Claude Opus. Reinforcement learning is ongoing, and the Grok Build harness is showing daily improvements. SpaceX also plans to ship completely new models from scratch monthly for the rest of 2026.

What Musk Announced

The announcement came directly from Elon Musk on X:

"Grok 4.5, based on our 1.5T V9 foundation model, with Cursor data added in supplemental training, is now in private beta at SpaceX & Tesla. Early evals show performance close to, perhaps exceeding Opus. RL is continuing to significantly improve the model, and the Grok Build harness is showing daily advancements."

Three details here are worth unpacking separately:

The 1.5T V9 foundation model — xAI's underlying architecture, now at 1.5 trillion parameters
Cursor data in supplemental training — coding interaction data from one of the most popular AI IDEs
Opus as the benchmark — Claude Opus is Anthropic's most capable reasoning model, the bar Grok 4.5 is being measured against

Why Cursor Data Matters

Cursor is an AI-native IDE used by hundreds of thousands of developers. When xAI says they added "Cursor data" in supplemental training, they almost certainly mean real developer interaction data — how engineers actually prompt AI to write code, debug issues, review diffs, and build software end-to-end.

This is a fundamentally different signal than synthetic benchmarks. Real Cursor sessions capture:

Agentic multi-turn workflows — a developer instructs the model, sees output, corrects it, iterates
Context window pressure — large codebases that stress memory and retrieval
Production code patterns — not toy examples, but real-world TypeScript, Python, Rust, Go
Error recovery — how models handle and fix compilation errors, test failures, and runtime issues

For coding AI benchmarks, this kind of data is gold. It's why models trained on real-world coding interactions consistently outperform those trained purely on static code corpora.

Compare this to how Claude models are benchmarked on SWE-Bench and DeepSWE — real software engineering tasks that require multi-step agentic reasoning. Grok 4.5 appears to be targeting exactly this category.

The V9 Foundation Model: What We Know

The 1.5T V9 designation tells us xAI is operating at the upper end of parameter scale. For context:

Model	Parameters (approx.)
Grok 4.5 (V9)	1.5T
GPT-5.6	Not disclosed
Claude Fable 5	Not disclosed
DeepSeek V4 Pro	~671B (MoE)

Large dense parameter counts are not always better than sparse Mixture-of-Experts architectures — DeepSeek V4 Pro demonstrated that MoE efficiency can match or beat dense models at a fraction of the compute. But paired with quality training data (including Cursor) and ongoing RL, a 1.5T dense model has enormous headroom.

Grok Build Harness

Musk referenced "daily advancements" in the Grok Build harness — xAI's internal training and evaluation pipeline for agentic tasks. This is xAI's equivalent of the harness-based evaluation systems that frontier labs use for agent benchmarks.

Update — July 16, 2026: SpaceXAI open-sourced Grok Build under Apache 2.0 — the harness behind this beta is now readable: Grok Build open source guide.

A build harness typically runs the model against a suite of agentic tasks — write code, run it, check output, fix bugs — in an automated loop. Daily advancements suggest xAI is in an active RL training phase where the model is improving rapidly on this task distribution.

SpaceX and Tesla as Private Beta Environments

Choosing SpaceX and Tesla as the beta environments is deliberate. Both companies have massive internal software engineering needs:

SpaceX: Flight software, simulation, avionics, embedded systems, data pipelines for Starship and Starlink
Tesla: Autopilot/FSD codebases, manufacturing automation, energy management, Dojo supercomputer software

These are not standard enterprise software stacks. They involve safety-critical code, unusual hardware constraints, and domain-specific requirements. Testing Grok 4.5 in these environments gives xAI access to production-grade evaluation at scale — far harder than standard coding benchmarks.

How It Compares to Claude Opus

Musk's claim that Grok 4.5 is "close to, perhaps exceeding Opus" needs context.

Claude Opus (part of the Fable 5 family) is Anthropic's most capable reasoning model, known for:

Long-horizon multi-step reasoning
Precise tool use and code analysis
Strong performance on agentic benchmarks
The foundation for Claude Mythos' security capabilities

The early independent reaction on X aligned with Musk's claim. Developer Mehul Mohan, who tested an early build, described the vibes as "similar to Opus." This is anecdotal but consistent with the internal eval framing.

What remains unverified: public benchmark scores on SWE-Bench, HumanEval, GPQA, or any of the standard evaluation suites that allow direct comparison.

Monthly New Models from SpaceX Through 2026

Perhaps the most ambitious part of the announcement is buried in the context: SpaceX plans to release completely new models trained from scratch every month for the rest of 2026.

This is a remarkable cadence. Training a 1.5T model from scratch takes significant compute and time even for a well-resourced lab. If accurate, it implies xAI has:

Sufficient GPU capacity (likely Colossus cluster) to run parallel training runs
A streamlined data pipeline that can turn around new training datasets monthly
Confidence that the Grok Build RL harness can rapidly improve each base model post-training

Monthly new model releases would put xAI on a faster iteration cycle than any other frontier lab has publicly committed to.

What This Means for the AI Race

Grok 4.5 is the latest signal in what has become an extraordinarily compressed AI race in 2026. Earlier this year:

DeepSeek V4 Pro disrupted pricing expectations
GLM-5.2 from Zhipu reportedly matched Claude Mythos on security benchmarks
Claude Fable 5 launched with Anthropic's biggest capability leap yet
GPT-5.6 pushed OpenAI's frontier further
Alibaba's Qwen 3.7-Max set new records on long-horizon agent benchmarks

Grok 4.5 positions xAI as a genuine player in the top tier — not just a social media AI, but a model targeting the most demanding agentic coding tasks in production environments.

For developers, the practical implication is that Opus-class coding capability may soon be available from multiple providers, increasing competition and likely driving down costs.

What to Watch

Public benchmark release — Will xAI publish Grok 4.5 scores on SWE-Bench, HumanEval, or GPQA before the public launch?
Cursor integration — Given the Cursor training data angle, will xAI partner with Cursor or release Grok 4.5 as a selectable model in the IDE?
Polymarket probability shift — The current 14% chance of a non-US lab leading AI by year-end is a market signal. A public Grok 4.5 release matching Opus would shift US-lab probabilities, not diminish them.
Monthly model cadence — Can SpaceX actually ship a new foundation model every month? The first few releases will test that claim.
Open weights possibility — No mention of open weights, but xAI has released open Grok models before. If V9 weights drop, the developer ecosystem impact would be enormous.

Bottom Line

Grok 4.5 entering private beta at SpaceX and Tesla is a credible frontier-model announcement. The combination of a 1.5T parameter base, real-world Cursor interaction data, ongoing RL improvements, and production testing in safety-critical environments is a serious technical approach — not just a benchmark chase.

Whether it truly matches or exceeds Claude Opus won't be known until independent benchmarks surface. But the direction is clear: xAI is targeting the same agentic coding and reasoning niche that Anthropic, OpenAI, and DeepSeek are all competing in — and doing it with access to production environments no other lab can replicate.

Further reading:

Reported based on Elon Musk's announcement on X as of June 28, 2026. Independent benchmark verification of Grok 4.5's performance claims was not available at time of publication.

Grok 4.5 Enters Private Beta at SpaceX and Tesla — Built on 1.5T V9 Model With Cursor Data

Related posts

Grok 4.5 in Cursor: SpaceXAI MoE Model — Benchmarks, Pricing, Cyber Guards

Musk vs Altman Scammer Feud: Space Data Centers, OpenAI History, and July 2026 Blowup

Grok 4.5 vs Claude Opus 4.7 and 4.8: Benchmarks, Price, and When to Switch