Krea 2 is a series of open-weights foundation models for text-to-image generation, built by Krea AI and released in June 2026. It ranks in the top 10 on the Artificial Analysis text-to-image leaderboard and 2nd among independent labs. The weights and inference code are released under a permissive license.

Does Krea 2 use AI-generated images in training?

No. Krea explicitly excludes all AI-generated images from pretraining. They argue that even a small proportion of synthetic images imposes an upper bound on model quality because synthetic images are easier to learn and introduce distribution biases. They built in-house classifiers to filter them out.

What text encoder does Krea 2 use?

Krea 2 uses Qwen3-VL as its text encoder, chosen because a VLM offers richer input space (text and image) and stronger multilingual generalization. They also introduce a shallow attention layer that aggregates features across VLM hidden layers rather than using only the last layer, combined with lightweight bidirectional transformer layers to reduce autoregressive bias.

What is STPO in the Krea 2 training pipeline?

STPO (Stabilized Training Preference Optimization) is Krea's custom variant of DPO. Standard DPO can cause policy divergence where the model reduces the likelihood of both winning and losing samples at different rates, leading to high-frequency artifacts. STPO adds an auxiliary loss and modifies the DPO objective to reduce this divergence.

Where can I access Krea 2 model weights?

The weights are available on Hugging Face and GitHub under a permissive license. The official release page is at krea.ai.

Krea 2 Technical Report: Open-Weights Image Model for Creative AI | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Krea 2 Technical Report: Open-Weights Image Model for Creative AI | explainx.ai Blog | explainx.ai

Krea AI published the Krea 2 Technical Report on June 23, 2026 — a 58-minute read covering the full stack behind their new open-weights text-to-image foundation model series. The headline number: Krea 2 places in the top 10 on the Artificial Analysis text-to-image leaderboard and 2nd among independent labs, with model weights and inference code released under a permissive license.

Update — July 10, 2026: Reve 2.1 claims #2 on Arena.ai and top 4K model while training on ~10× fewer GPUs — another independent-lab efficiency signal in the same month.

What makes this report worth reading carefully is not the benchmark position alone — it is the unusually detailed account of every major decision in the pipeline, including several that go against common practice. No AI-generated training data. A custom DPO variant to prevent policy collapse. A PostgreSQL-based data warehouse they built from scratch. A Kubernetes + Weka setup where the entire cluster flips to research training on demand.

Here is a structured breakdown.

The Core Thesis: Exploration Over Convergence

Most state-of-the-art image models have converged toward a narrow default aesthetic — reliable, polished, and predictable. Krea argues this makes them effective production tools but weak engines for creative exploration, where users need to search across styles and moods rather than receive a single best guess.

Krea 2 is explicitly designed around the opposite priority: wide aesthetic diversity first, with user-controllable navigation of that space through both text and image inputs.

Data Curation: What They Filter Out (Not In)

The most notable data decision is what Krea does not do.

No aesthetic score oversampling. Most pipelines use CLIP-based or IQA aesthetic scores to upsample "good" images. Krea argues this introduces implicit biases — a motion-blurred image might score low but represent a valid artistic choice. Their pretraining filters only remove:

Duplicates and over-represented concepts
Images where VLMs consistently fail to caption accurately
Images that introduce undesired artifacts and biases
High-complexity images that cannot be represented at low resolution
AI-generated images (more on this below)

Zero synthetic images in pretraining. This is unusual and deliberate. Krea's finding: even a small percentage of AI-generated images in a training mix creates an upper bound on model quality because synthetic images are disproportionately easy to learn, effectively pulling the training distribution toward them. They built in-house classifiers specifically to detect and remove synthetic images.

Component	Baseline	Final Choice
Attention	Multi-head	GQA + gated sigmoid attention
MLP	GeLU	SwiGLU (4× expansion)
Text encoder	T5-XXL	Qwen3-VL with multilayer feature aggregation
Modulation	Per-block MLP	Per-block tunable bias
Autoencoder	FLUX AE	Qwen Image VAE + FLUX 2 AE
Norm	LayerNorm	Zero-centered RMSNorm + QKNorm
Positional encoding	—	3D Axial RoPE
Block design	—	Single-stream transformer

Krea 2 Technical Report: Open-Weights Image Foundation Model Built for Creative Exploration

The Core Thesis: Exploration Over Convergence

Data Curation: What They Filter Out (Not In)

Related posts

Ideogram 4.0: Open-Weight Image Generation — How to Run, API & JSON Prompts (2026)

Meta Brain2Qwerty v2: Reading Your Thoughts Without Surgery

Anthropic Commits $10M CAD to Canadian AI Research — Amii, Mila, Vector & 8 Partners

Deduplication in Practice

Sparse Autoencoder Tagging

Captioning Pipeline

Midtraining Data: Wikipedia PageRank for Entity Coverage

Architecture: What Survived Ablation

Why Gated Sigmoid Attention

The Timestep Modulation Decision

Text Encoder: Not Just the Last Layer

Training Pipeline: Five Stages

1. Pretraining (256px → 512px → 1024px)

2. Midtraining

3. Supervised Finetuning (SFT)

4. Preference Optimization (PO) + STPO

5. Reinforcement Learning (RL)

Timestep Distillation (Optional)

Prompt Expansion

Style Reference System

Infrastructure

Kubernetes + Kueue

Training Launch Procedure

Observability

Weka Filesystem

Krablet Data System

Future Work They Called Out

What to Take Away