What is Cohere Command A+?

Command A+ is a 218 billion parameter Sparse Mixture-of-Experts (MoE) language model released by Cohere on May 20, 2026, under full Apache 2.0 open-source license. With only 25 billion parameters active during generation, it delivers frontier performance while running on a single NVIDIA Blackwell B200 GPU or just 2 H100 GPUs. It's Cohere's first fully Apache-licensed model.

What makes Command A+ different from other open models?

Three key differentiators: (1) Native citation generation with explicit grounding spans linking every claim to source documents. (2) W4A4 lossless quantization enabling deployment on 2 H100s without quality degradation. (3) Full Apache 2.0 license (not just model weights—all components), making it the first fully open frontier model from Cohere for sovereign AI use.

What is W4A4 quantization?

W4A4 is 4-bit quantization for both weights (W) and activations (A), compressing the model to 25% of its original size while maintaining lossless performance. This allows Command A+ to run on 2 H100 GPUs instead of requiring 8+. Cohere describes this as a breakthrough in quantization techniques.

What are native citations in Command A+?

Command A+ generates explicit 'grounding spans' that directly link every factual claim to the specific source document or database row it pulled the information from. Instead of post-hoc retrieval, citations are generated natively during inference, making them more accurate and reducing hallucination.

How does Command A+ compare to other frontier models?

Command A+ shows across-the-board improvements for agentic, reasoning, and multi-step tasks compared to previous Command A models. It's over 2× faster output speed and 30% lower latency. Compared to GPT-OSS (unclear which variant), Command A+ shows competitive or superior performance on benchmarks, though full public comparisons are pending.

What is sovereign AI and why does Command A+ matter?

Sovereign AI refers to nations and enterprises running AI infrastructure on their own terms—controlling the model, data, and deployment. Command A+ enables this by providing full Apache 2.0 licensing (no restrictions), efficient deployment (2 H100s), and 48-language support. Countries can run frontier AI without dependency on US cloud providers or closed APIs.

Where can I download Command A+?

Available on Hugging Face at CohereLabs/command-a-plus-05-2026-w4a4 (W4A4 quantized version). Full model weights and BF16/FP8 variants available. Compatible with vLLM and standard transformers pipelines. More details at cohere.com/blog/command-a-plus.

Cohere Command A+: the first fully Apache 2.0 enterprise | explainx.ai Blog

explainx.ainewsletter3.5k

Cohere Command A+: the first fully Apache 2.0 enterprise | explainx.ai Blog | explainx.ai

On May 20, 2026, Cohere released Command A+—a 218 billion parameter Sparse Mixture-of-Experts (MoE) language model with 25 billion active parameters and full Apache 2.0 open-source licensing. The model features native citation generation (explicit grounding spans linking every claim to source documents), W4A4 lossless quantization (enabling deployment on just 2 NVIDIA H100 GPUs), and 48-language support with improved efficiency in non-European languages. Command A+ is Cohere's first fully Apache-licensed frontier model, positioning sovereign AI as accessible to enterprises and nations seeking to control their own AI infrastructure. The release marks a breakthrough in quantization techniques and a strategic shift toward open-weight models for critical infrastructure.

This article is a field guide: what Command A+ is, key features, benchmarks, sovereign AI context, deployment options, and when to choose Command A+ over closed models.

TL;DR

Question	Short answer
What is it?	A 218B parameter MoE model (25B active) with native citations, W4A4 quantization, and full Apache 2.0 license—first fully open frontier model from Cohere.
Announced	May 20, 2026 by Cohere.
Key innovation	W4A4 lossless quantization—4-bit weights + activations with no quality degradation, enabling 2-H100 deployment.
Native citations	Generates explicit grounding spans linking every factual claim to specific source documents or database rows.

Format	Precision	VRAM required	Speed	Quality
BF16	16-bit	~400GB (8+ H100s)	Baseline	Baseline
FP8	8-bit	~200GB (4 H100s)	1.5× faster	Minimal loss
W4A4	4-bit	~80GB (2 H100s)	2× faster	Lossless

Model	License	Commercial use	Modifications	Redistribution	Sovereign use
Command A+	Apache 2.0	✅ Unlimited	✅ Yes	✅ Yes	✅ Yes
Llama 3	Custom (Meta)	✅ With restrictions	✅ Yes	❌ Restricted	❌ Restricted
Mistral Large	Mistral AI License	✅ Tiered	✅ Limited	❌ No	❌ No
GPT-4	Closed	❌ API only	❌ No	❌ No	❌ No
Gemini	Closed	❌ API only	❌ No	❌ No	❌ No

Format	GPUs	VRAM	Throughput
BF16	8× H100	~400GB	Baseline
FP8	4× H100	~200GB	1.5× faster
W4A4	2× H100	~80GB	2× faster
Blackwell B200	1× GPU	192GB	2× faster

bash

# W4A4 quantized model
pip install transformers torch

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "CohereLabs/command-a-plus-05-2026-w4a4",
    device_map="auto",
    torch_dtype="auto",
)
tokenizer = AutoTokenizer.from_pretrained("CohereLabs/command-a-plus-05-2026-w4a4")

inputs = tokenizer("What were Q4 2025 revenues?", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Factor	Command A+	Llama 3 (405B)	Mistral Large 2	GPT-4	Gemini 1.5 Pro
Parameters	218B (25B active)	405B (dense)	~100B (MoE)	Unknown	Unknown
License	Apache 2.0 (full)	Meta (restricted)	Mistral (tiered)	Closed	Closed
Deployment	2 H100s (W4A4)	8+ H100s	4 H100s	API only	API only
Native citations	Yes	No	No	No	No
Languages	48	~20	~12	~50+	~50+
Sovereign AI	Yes	Restricted	No	No	No
Speed (vs baseline)	2× faster	Baseline	1.5× faster	Unknown	Unknown

Cohere Command A+: the first fully Apache 2.0 enterprise AI model that runs on 2 H100s (May 2026)

TL;DR

Related posts

Open source AI for Fortune 500: governance, multi-region hosting, and escaping Annex A dependency (2026)

Fable 5 and GPT-5.6 open-source alternatives: enterprise benchmark map and how to host at scale in 2026

Apertus: The Fully Open Foundation Model Making AI Truly Sovereign

What is Command A+?

Feature 01: Native citation generation with grounding spans

Feature 02: W4A4 lossless quantization—breakthrough efficiency

Feature 03: 48-language support with non-European efficiency

Feature 04: Agentic workflow optimization

Feature 05: Full Apache 2.0 licensing—sovereign AI

Benchmarks and performance

Speed and efficiency

Reasoning and agentic tasks

Multimodal document processing

Use cases: sovereign AI, enterprise RAG, critical infrastructure

01. Sovereign AI for nations

02. Enterprise RAG over internal documents

03. Critical infrastructure (defense, healthcare, finance)

Deployment: 2 H100s, vLLM, transformers

Command A+ vs other frontier models

Limitations and future work

Sources

TL;DR

Related posts

Open source AI for Fortune 500: governance, multi-region hosting, and escaping Annex A dependency (2026)

Fable 5 and GPT-5.6 open-source alternatives: enterprise benchmark map and how to host at scale in 2026

Apertus: The Fully Open Foundation Model Making AI Truly Sovereign

What is Command A+?

Feature 01: Native citation generation with grounding spans

Feature 02: W4A4 lossless quantization—breakthrough efficiency

Feature 03: 48-language support with non-European efficiency

Feature 04: Agentic workflow optimization

Feature 05: Full Apache 2.0 licensing—sovereign AI

Benchmarks and performance

Speed and efficiency

Reasoning and agentic tasks

Multimodal document processing

Use cases: sovereign AI, enterprise RAG, critical infrastructure

01. Sovereign AI for nations

02. Enterprise RAG over internal documents

03. Critical infrastructure (defense, healthcare, finance)

Deployment: 2 H100s, vLLM, transformers

Command A+ vs other frontier models

Limitations and future work

Related on explainx.ai

Sources