On June 3, 2026, Ideogram released 4.0 — its first open-weight frontier text-to-image model. The weights are on GitHub and Hugging Face. The hosted API is live at developer.ideogram.ai.
The headline is not just "another open diffusion model." Ideogram 4.0 closes the quality gap between proprietary frontier image models and the open ecosystem on the axes that matter for production design work: typography in scene, deterministic layout, and 2K photoreal output. CEO Mohammad Norouzi put it directly: "The hardest problems at the forefront of design generation — headline-grade typography, deterministic layout, branded layered output — need a foundation engineered for them."
This guide covers what shipped, how the architecture differs from unified multimodal stacks, and how to run Ideogram 4.0 — via API, CLI, and self-hosted inference.
Quick reference
| Detail | Value |
|---|---|
| Release date | June 3, 2026 |
| Parameters | 9.3B |
| Architecture | Flow-matching DiT, single-stream, Qwen3-VL-8B text encoder |
| Max resolution | 2048×2048 (multiples of 16, aspect ratios up to 6:1) |
| Open weights | ideogram-oss/ideogram4 |
| Checkpoints | ideogram-4-nf4 (24GB GPU) · ideogram-4-fp8 |
| API endpoint | POST https://api.ideogram.ai/v1/ideogram-v4/generate |
| API pricing | Turbo $0.03 · Default $0.06 · Quality $0.10 per image |
| Prompt format | JSON-first (plain text via magic-prompt expansion) |
| GitHub stars | 2,100+ (as of June 2026) |
Jump to the path you need:
- What Ideogram 4.0 actually ships
- Benchmarks and where it ranks
- How to run via the Ideogram API
- How to run locally (CLI)
- JSON prompting and magic-prompt
- Bounding-box layout and color palettes
- API endpoints beyond generate
- When to use API vs local vs the app
What Ideogram 4.0 ships today
Three capabilities anchor the release, per Ideogram's press release:
1. Text rendering at production fidelity
Ideogram has led on in-scene typography since its 2023 launch. Version 4.0 extends that with multilingual support, denser type at smaller scales, and reliable rendering of headlines, packaging copy, and signage. In a ContraLabs blind evaluation judged by ten professional designers, Ideogram 4.0 was picked as best 47.9% of the time — ahead of Gemini 3.1 Flash Image Preview (30.0%), FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%).
2. Bounding-box layout control
You specify where a logo, headline, callout, or subject belongs on the canvas using normalized [y_min, x_min, y_max, x_max] coordinates on a 0–1000 grid. Layout is directed by the brief, not sampled and corrected afterward.
3. Photoreal output at 2K
Native support for resolutions from 256 to 2048 (multiples of 16), with aspect ratios up to 6:1. For highest quality locally, the README recommends --height 2048 --width 2048 --sampler-preset V4_QUALITY_48.
Layer-based roadmap
Most professional design work is not a single pixel layer. Ideogram 4.0 is the start of a generation stack:
| Capability | Status |
|---|---|
| Transparent background cutouts | Available via Background Remover API |
| Editable text + movable image layers | Follow-up 4.0 release |
| Branded assets (typography, palette, logo fidelity) | Scheduled |
Claude for Work
Use Claude as a thought partner for writing, research & decisions — no coding required. 2 live sessions with Yash Thakker.
Claude for Work is a 2-day live workshop on using Claude to supercharge your daily work — writing, research, analysis, and decision-making — without any coding required. Learn how to set up Claude Projects with custom instructions, run deep-research sprints, co-write documents that sound like you, and build repeatable prompt systems for your team. August 1–2, 2026. Hosted by Yash Thakker, founder of AISOLO Technologies, instructor to 350,000+ students.
Includes 1-year access to all session recordings, a personal prompt library, Discord community access, and a certificate of completion. No coding or technical background required. Designed for managers, marketers, founders, and writers.
Architecture: a specialized foundation, not a unified multimodal model
Ideogram 4.0 is a foundation model trained entirely from scratch — not a fine-tune or distillation of any existing checkpoint. Key architectural choices from the GitHub README:
| Component | Detail |
|---|---|
| Backbone | 34-layer single-stream Diffusion Transformer (DiT) — text and image tokens in one unified sequence |
| Text encoder | Qwen3-VL-8B-Instruct — hidden states from 13 intermediate layers concatenated |
| Training objective | Flow matching |
| Guidance | Dual-branch classifier-free guidance (independent positive/negative refinement) |
| Training data format | Structured JSON captions exclusively |
The bet is explicit: unified multimodal models (GPT Image, Gemini) are strong generalists, but headline-grade typography, deterministic layout, and brand fidelity require a foundation engineered for design specifically. At 9.3B parameters, Ideogram 4.0 delivers the best text rendering of any open-weight release Ideogram benchmarked — ahead of Qwen-Image (20B), FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE).
For a general primer on how diffusion image models work under the hood, see our diffusion explainer.
Benchmarks: where Ideogram 4.0 ranks
| Benchmark | Result |
|---|---|
| Design Arena (overall) | Top open-weight model; trails only proprietary GPT and Gemini |
| Design Arena (open-weight only) | #1 by commanding margin |
| ContraLabs typography (1st-place win rate) | 47.9% |
| ContraLabs "would use in client work" | 3.55 / 5 |
| LMArena text-to-image | Top open-weight lab, top-5 overall |
| 7Bench (layout control) | Better than all closed-source models tested |
| Internal human-preference (design + photography) | #2 overall — behind only GPT Image 2 medium |
The pattern is consistent: Ideogram 4.0 is the best open-weight image model by far, and sits at the frontier of design-oriented generation.
How to run Ideogram 4.0 via the API
The fastest path for production pipelines. No GPU required.
Step 1: Get an API key
- Sign up at developer.ideogram.ai
- Add payment method in the API Dashboard (billing is separate from the Ideogram app subscription)
- Copy your
Api-Key
Step 2: Generate your first image
Python:
import requests
response = requests.post(
"https://api.ideogram.ai/v1/ideogram-v4/generate",
headers={"Api-Key": "<your-api-key>"},
json={
"text_prompt": "A poster for a summer design conference with bold sans-serif typography",
"rendering_speed": "DEFAULT",
"aspect_ratio": "ASPECT_16_9",
},
)
image = response.json()["data"][0]
print(image["url"])
cURL:
curl -X POST https://api.ideogram.ai/v1/ideogram-v4/generate \
-H "Api-Key: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"text_prompt": "A poster for a summer design conference",
"rendering_speed": "TURBO"
}'
TypeScript:
const res = await fetch("https://api.ideogram.ai/v1/ideogram-v4/generate", {
method: "POST",
headers: {
"Api-Key": "<your-api-key>",
"Content-Type": "application/json",
},
body: JSON.stringify({
text_prompt: "A poster for a summer design conference",
rendering_speed: "DEFAULT",
}),
});
const { data } = await res.json();
console.log(data[0].url);
API pricing and speed tiers
| Rendering speed | Price per image | Use case |
|---|---|---|
| TURBO | $0.03 | Rapid prototyping, A/B testing |
| DEFAULT | $0.06 | Daily production work |
| QUALITY | $0.10 | Final delivery assets |
No subscription required. Default rate limit: 10 in-flight requests. For higher throughput, contact [email protected].
Important: Image URLs are ephemeral — download and store results in your own system immediately after generation.
How to run Ideogram 4.0 locally (CLI)
Self-host when you need gradients, fine-tuning, or air-gapped inference.
Prerequisites
- CUDA GPU with 24GB VRAM (NF4 checkpoint) or broader hardware (FP8)
- Python 3.10+
- Hugging Face account with accepted license gate
Step 1: Clone and install
git clone https://github.com/ideogram-oss/ideogram4.git
cd ideogram4
pip install .
For development, use editable mode: pip install -e .
Step 2: Accept the license gate and authenticate
- Open ideogram-ai/ideogram-4-nf4 on Hugging Face
- Click Agree and access repository
- Authenticate:
hf auth login
# or: export HF_TOKEN="hf_..."
Without this step, downloads fail with 404 / GatedRepoError.
Step 3: Generate with plain-text prompt
Plain --prompt is expanded into structured JSON by magic-prompt — Ideogram's hosted LLM expansion, which is free and requires only an API key:
export IDEOGRAM_API_KEY="your_key_from_developer.ideogram.ai"
python run_inference.py \
--prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \
--output out.png \
--quantization "nf4" \
--magic-prompt-key "$IDEOGRAM_API_KEY"
Step 4: Max quality settings
For 2K output with the quality sampler preset:
python run_inference.py \
--prompt "a campaign poster with clean sans-serif typography" \
--output poster.png \
--quantization "nf4" \
--height 2048 \
--width 2048 \
--sampler-preset V4_QUALITY_48 \
--magic-prompt-key "$IDEOGRAM_API_KEY"
Optional: safety screening with Hive
For production deployments, enable prompt and output moderation via Hive:
export HIVE_TEXT_MODERATION_KEY="..."
export HIVE_VISUAL_MODERATION_KEY="..."
python run_inference.py \
--prompt "an isometric illustration of a tiny city floating in the clouds" \
--output out.png \
--quantization "nf4" \
--magic-prompt-key "$IDEOGRAM_API_KEY" \
--hive-text-key "$HIVE_TEXT_MODERATION_KEY" \
--hive-visual-key "$HIVE_VISUAL_MODERATION_KEY"
Model checkpoints
| Checkpoint | Quantization | Hardware | Diffusers |
|---|---|---|---|
| ideogram-4-nf4 | NF4 | CUDA (24GB) | Yes |
| ideogram-4-fp8 | FP8 | All | No |
See docs/inference.md for sampler presets, parameter reference, and optimization tips.
JSON prompting: the format that matters
Ideogram 4.0 was trained exclusively on structured JSON captions. Plain text works — but JSON is the native language.
Why JSON-only training?
From the prompting guide:
We train exclusively on JSON so that training and inference share a single, common prompt format. The training captions themselves are deliberately extremely descriptive: each JSON exhaustively describes everything in the image.
Plain-text prompts create train/eval mismatch. JSON mirrors the training distribution and unlocks full model quality.
The caption schema (three top-level fields)
{
"high_level_description": "A clean business card layout for a tech startup.",
"style_description": {
"aesthetics": "minimal, professional, geometric",
"lighting": "even, diffuse studio lighting",
"medium": "graphic_design",
"art_style": "flat vector design, generous whitespace, sans-serif typography",
"color_palette": ["#FFFFFF", "#F0F0F0", "#333333", "#0066FF", "#00CC88"]
},
"compositional_deconstruction": {
"background": "A solid off-white card surface with subtle paper texture.",
"elements": [
{
"type": "text",
"text": "ACME TECH",
"desc": "Bold dark grey sans-serif company name across the upper third."
},
{
"type": "text",
"text": "[email protected]",
"desc": "Small blue sans-serif contact email near the bottom."
}
]
}
}
| Field | Required | Purpose |
|---|---|---|
high_level_description | Strongly recommended | One- or two-sentence summary |
style_description | Optional | Aesthetics, lighting, medium, color palette |
compositional_deconstruction | Required | Background + spatial elements |
Element types: "obj" for objects/subjects, "text" for in-image text (include a text field with the literal string to render).
Magic-prompt: JSON without writing JSON
Don't want to hand-write captions? Magic-prompt expands plain text into full structured JSON before generation.
Three backends ship in the repo:
| Config | Registry key | Backend |
|---|---|---|
Ideogram4MagicPromptV1 | ideogram-4-v1 | Ideogram hosted API (free) |
ClaudeOpusMagicPromptV1 | claude-opus-v1 | OpenRouter |
ClaudeSonnetMagicPromptV1 | claude-sonnet-v1 | OpenRouter |
The hosted ideogram-4-v1 backend is the default in run_inference.py and only needs IDEOGRAM_API_KEY. The magic-prompt system prompts are open source in src/ideogram4/magic_prompt_system_prompts/.
Via the API, two endpoints scaffold the JSON workflow:
| Endpoint | Purpose |
|---|---|
POST /v1/ideogram-v4/magic-prompt | Convert plain text → structured json_prompt |
POST /v1/ideogram-v4/describe | Upload a reference image → structured JSON prompt (preserves bboxes optionally) |
Practical workflow: Start with text_prompt for fast ideation. Migrate to json_prompt once layout precision, brand hex colors, or multi-line typography matter.
Bounding-box layout and color palettes
Spatial control with bbox
Each element can include a bounding box in normalized 0–1000 coordinates (origin top-left):
{
"type": "text",
"bbox": [100, 200, 300, 800],
"text": "SUMMER SALE",
"desc": "Large bold red headline across the upper center of the poster."
}
Format: [y_min, x_min, y_max, x_max]. This is native to the model — no ControlNet pipeline required.
Color palette conditioning
Steer dominant colors with hex codes in style_description.color_palette:
"color_palette": ["#1B1B2F", "#162447", "#1F4068", "#E43F5A", "#F5F5F5"]
Rules from the prompting guide:
- Up to 16 colors in
style_description.color_palette - Up to 5 colors per element
- Uppercase hex only —
#RRGGBBform (not#fffor lowercase) - Include both highlight and shadow colors for controlled lighting
On 7Bench (layout control), Ideogram 4.0 scored significantly better than all closed-source models tested — the bbox + palette system is the differentiator.
API endpoints beyond generate
The Ideogram API is not just text-to-image. Full capability list from ideogram.ai/api-learn:
| Capability | Endpoint family | Notes |
|---|---|---|
| Generate | /v1/ideogram-v4/generate | Text or JSON prompt → image |
| Transparent backgrounds | v4 endpoints | Native alpha cutouts |
| Edit with prompt | v3 endpoints | Describe changes in plain language |
| Remix | v3 endpoints | Reimagine with image_weight control |
| Reframe | v3 endpoints | Extend to new aspect ratio |
| Remove background | v4 endpoints | Clean cutout in one call |
| Layerized text | v3 endpoints | Pull editable text layers |
| Custom models | Training + generate | Fine-tune on brand assets |
| Upscale | Upscale endpoint | Raise resolution for delivery |
| Magic-prompt | /v1/ideogram-v4/magic-prompt | Plain text → JSON caption |
| Describe | /v1/ideogram-v4/describe | Image → JSON caption |
Ideogram 4.0 also supports MCP for agent workflows — useful if you're wiring image generation into coding agents or design automation pipelines. For agent harness concepts, see our Agent Harness guide.
When to use API vs local vs the app
| Surface | Best for | Trade-off |
|---|---|---|
| Ideogram app | Hands-on creation, iteration, editing | Subscription credits; no programmatic access |
| API | Production pipelines, product integration, agents | Per-image cost; ephemeral URLs |
| Local (CLI) | Fine-tuning, research, air-gapped, unlimited gen | 24GB GPU; magic-prompt still needs API key (free) |
| ComfyUI | Node-based visual workflows | Requires ComfyUI 0.24.0+ and image_ideogram4_t2i.json template |
For most developers building image generation into a product, start with the API (Turbo at $0.03/image for prototyping). Move to local inference when you need custom fine-tunes, synthetic data pipelines, or on-premise deployment.
For comparison with other 2026 image models, see our posts on ChatGPT Images 2.0 / gpt-image-2 and the diffusion fundamentals guide.
Enterprise and commercial licensing
Open weights ship under Ideogram's commercial license. Key points from the press release:
- Fine-tuning on brand data with weights, training data, and inference staying on customer infrastructure
- Headquartered in Toronto and San Francisco — no embedded political alignment in weights
- Commercial license tiers at ideogram.ai/licensing
- Enterprise inquiries → [email protected]
The NF4/FP8 Hugging Face checkpoints use a non-commercial license for the open release. Commercial use through the API or enterprise licensing is the production path.
Summary
Ideogram 4.0 is the most significant open-weight image release of 2026 for anyone who ships visual assets — not hobbyists generating cats, but teams that need readable type, controlled layout, and 2K fidelity.
Three things to remember:
- JSON is the native prompt format. Use magic-prompt for casual input; write JSON when layout and typography matter.
- Three ways in: API for products, CLI for research/self-hosting, app for hands-on design.
- It closes the open-vs-closed gap on design benchmarks while staying at 9.3B parameters — a fraction of FLUX.2 [dev]'s 32B.
Related reading
- ChatGPT Images 2.0 and gpt-image-2 — OpenAI's April 2026 flagship image model
- How Diffusion Image Generation Works — noise schedules, latents, and CFG explained
- What Is an Agent Harness? — wiring tools like Ideogram MCP into agent loops
- Design.md: Open Spec for AI Design Systems — structured design specs for AI workflows
- Claude for Work Workshop — hands-on AI skills for professionals
Official sources: Ideogram 4.0 press release · GitHub repo · API docs · Prompting guide · Technical blog
Model specs, API pricing, and benchmark numbers in this post reflect publicly available information as of June 20, 2026. Verify current pricing and license terms at ideogram.ai before production deployment.