What is Cohere North Mini Code?

North Mini Code is Cohere's first open-source agentic coding model, released June 9, 2026 under Apache 2.0 license. It's a 30B parameter mixture-of-experts (MoE) model with just 3B active parameters, optimized for code generation, agentic software engineering, and terminal tasks with 256K context window.

How does North Mini Code compare to Devstral Small 2?

In Cohere's internal tests, North Mini Code achieved up to 2.8x higher output throughput than Devstral Small 2 under identical concurrency and hardware configurations. It also demonstrated a 30% advantage in inter-token latency, though Devstral Small 2 maintained a slight edge in time-to-first-token (TTFT) performance.

What benchmarks does North Mini Code support?

North Mini Code achieves competitive scores on SWE-Bench Verified, SWE-Bench Pro, Terminal-Bench v2, and Terminal-Bench Hard. It scores 33.4 on the Artificial Analysis Coding Index, positioning it competitively among similarly sized open-source models for agentic software engineering tasks.

What hardware does North Mini Code require?

The minimum hardware requirement is 1× H100 GPU at FP8 precision. The model's efficient mixture-of-experts architecture—with only 3B active parameters out of 30B total—enables it to run on modest hardware compared to dense models of equivalent capability.

How can I deploy North Mini Code?

North Mini Code is available through multiple channels: download weights from Hugging Face, deploy on Cohere Model Vault (managed inference), access via Cohere API, or use through OpenRouter. It's specifically optimized for OpenCode harness but works with most coding agents.

Cohere North Mini Code: Open-Source Agentic Coding (Apache 2.0) | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Cohere North Mini Code: Open-Source Agentic Coding (Apache 2.0) | explainx.ai Blog | explainx.ai

TL;DR: On June 9, 2026, Cohere launched North Mini Code—their first open-source agentic coding model under Apache 2.0 license. A 30B parameter mixture-of-experts (MoE) architecture with just 3B active parameters, North Mini Code delivers competitive performance on SWE-Bench Verified, SWE-Bench Pro, and Terminal-Bench 2.0 benchmarks while achieving up to 2.8x higher output throughput than Devstral Small 2. Built for sovereign AI deployment, it supports 256K context, requires just 1× H100 GPU at FP8, and is optimized for agentic software engineering workflows including sub-agent orchestration, systems architecture mapping, and code reviews.

Cohere's Entry into Open-Source Developer Models

After establishing itself with enterprise-focused models like Command A+, Cohere is expanding into the developer ecosystem with North Mini Code—the first model in their next generation of powerful, open-source AI systems.

The Sovereign AI Positioning

Core Mission: Enable developers to deploy agentic coding capabilities on their own terms—on-premises, locally, or in private clouds—without vendor lock-in or usage restrictions.

Market Context:

Closed Ecosystem: GPT-5.5, Claude Fable 5, Gemini 3.5 require API access
Open Alternatives: DeepSeek V4 Pro, Qwen, Llama 4 offer self-hosting
Gap: Limited small, efficient open models optimized for agentic coding

North Mini Code's Position:

mermaid

graph TD
    A[Developer Needs] --> B{Deployment Preference?}
    B -->|Cloud API| C[GPT-5.5, Claude Fable 5]
    B -->|Self-Hosted| D[North Mini Code, Qwen, DeepSeek]
    D --> E{Size Constraint?}
    E -->|Large OK| F[DeepSeek V4 Pro 236B]
    E -->|Small Preferred| G[North Mini Code 30B MoE]
    G --> H[3B Active - Efficient]

Technical Specifications

Model Architecture

North Mini Code at a Glance:

Specification	Details
Model Type	Mixture-of-Experts (MoE)
Total Parameters	30B
Active Parameters	3B (per forward pass)
License	Apache 2.0
Context Length	256K tokens total
Max Generation	64K tokens
Precision	FP8 (recommended)
Hardware Minimum	1× H100 GPU
Training Focus	Code generation, agentic workflows, terminal tasks

Mixture-of-Experts Efficiency

How MoE Works:

python

# Conceptual MoE forward pass
def moe_forward(input_tokens):
    # Route each token to specialized expert
    routing_weights = router(input_tokens)  # Learned routing

    # Only activate top-k experts per token
    active_experts = select_top_k(routing_weights, k=2)

    # Process with sparse activation
    output = sum([
        expert(tokens) * weight
        for expert, weight in active_experts
    ])

    return output  # Only ~3B params active, not 30B

Benefits:

✅ Efficiency: Only 10% of parameters active per forward pass
✅ Speed: Faster inference than 30B dense model
✅ Specialization: Different experts for different coding patterns
✅ Cost: Reduced memory and compute requirements

Context Window and Generation

256K Context Window:

Entire Codebases: Fit large projects in context
Multi-File Reasoning: Understand dependencies across files
Long Debugging Sessions: Maintain context through extended interactions
Documentation: Include extensive API references and docs

64K Maximum Generation:

Complete Modules: Generate full feature implementations
Comprehensive Refactors: Rewrite large sections of code
Detailed Explanations: Provide thorough code walkthroughs
Test Suites: Generate extensive test coverage

Comparison:

Model	Context Window	Max Generation
North Mini Code	256K	64K
GPT-4o	128K	16K
Claude 3.5 Sonnet	200K	8K
Devstral Small 2	128K	32K
DeepSeek V4 Coder	128K	8K

Performance Benchmarks

Agentic Software Engineering Scores

Cohere evaluated North Mini Code on industry-standard benchmarks for agentic coding capabilities:

SWE-Bench Performance:

Benchmark	North Mini Code	Competitor Range*
SWE-Bench Verified	Competitive**	15-35% (similar size)
SWE-Bench Pro	Competitive**	10-25% (similar size)
Terminal-Bench v2	Competitive**	35-55% (similar size)
Terminal-Bench Hard	Competitive**	20-40% (similar size)

*Competitor range includes Gemma 4 E4B, DeepSeek Coder 7B, Qwen2.5 Coder 32B, and similar-sized models **Cohere reports "competitive scores" without disclosing exact percentages

Artificial Analysis Coding Index:

Score: 33.4
Interpretation: Aggregated performance across coding benchmarks
Positioning: Competitive among similarly sized open-source models

What These Benchmarks Measure

SWE-Bench Verified:

Real GitHub issues from popular repositories
Requires generating patches that pass existing test suites
Tests understanding of complex codebases
Measures production-ready code generation

Terminal-Bench 2.0:

89 carefully curated tasks across diverse domains
Multi-step terminal workflows
System administration, ML, security, biology tasks
Tests agentic planning and tool use

Terminal-Bench Hard:

More challenging subset of Terminal-Bench tasks
Requires advanced reasoning and error recovery
Tests long-horizon task completion
Evaluates robustness to failures

Evaluation Harnesses Used:

python

# SWE-Bench evaluation
harness = "SWE-agent"  # Standard SWE-Bench harness

# Terminal-Bench v2 evaluation
harness = "ReAct + single terminal tool"  # Simple reasoning loop

# Terminal-Bench Hard evaluation
harness = "Terminus-2"  # Advanced multi-tool harness

Speed and Efficiency Advantages

Cohere conducted internal performance testing comparing North Mini Code to Devstral Small 2 (Mistral's small coding model) under identical hardware and concurrency conditions.

Output Throughput: 2.8x Advantage

Test Configuration:

Hardware: Identical GPU setup (1× H100)
Concurrency: High and low concurrency levels tested
Workload: Real-world coding prompts
Precision: FP8 for both models

Results:

Metric	North Mini Code	Devstral Small 2	Advantage
Output Throughput (High Concurrency)	2.8x baseline	1.0x baseline	+180%
Output Throughput (Low Concurrency)	2.5x baseline	1.0x baseline	+150%
Inter-Token Latency	30% lower	Baseline	+30%
Time-to-First-Token (TTFT)	Slightly slower	Baseline	-5%

Practical Implications:

snippet

Example: Generating a 1,000-token code file

Devstral Small 2:  ~10 seconds
North Mini Code:   ~3.5 seconds  (2.8x faster)

In a development session generating 10 files:
Devstral Small 2:  100 seconds
North Mini Code:   35 seconds   (saves 65 seconds)

Inter-Token Latency Improvements

Inter-Token Latency measures the consistency and pacing of token generation—critical for smooth streaming and user experience.

30% Improvement:

Smoother Streaming: More consistent token delivery
Better UX: Reduces perceived "stuttering" during generation
Predictable Performance: More reliable latency characteristics
Higher Throughput: Less time waiting between tokens accumulates

Time-to-First-Token Trade-off

TTFT (Time-to-First-Token):

North Mini Code: Slightly slower than Devstral Small 2 (~5% difference)
Devstral Small 2: Maintains edge in prompt processing speed

Why the Trade-off Exists:

snippet

MoE Architecture Impact:
- Routing overhead: Selecting which experts to activate
- Sparse activation: Coordinating distributed experts
- Offset by: Massive throughput gains during generation

Net Result: Slightly slower to start, much faster to complete

When It Matters:

⚠️ Short Completions: TTFT dominates total time
✅ Long Completions: Throughput gains overwhelm TTFT delay
✅ Batch Processing: Amortized startup cost negligible

Agentic Coding Capabilities

North Mini Code is specifically optimized for agentic software engineering workflows—multi-step, autonomous coding tasks that go beyond simple code completion.

Sub-Agent Orchestration

Capability: Understand and coordinate multiple specialized sub-agents

Example Workflow:

python

# North Mini Code orchestrating sub-agents
main_agent_prompt = """
Task: Implement user authentication system

Sub-agents to coordinate:
1. Database Schema Agent: Design user tables and indexes
2. API Endpoint Agent: Create REST endpoints for auth
3. Frontend Form Agent: Build login/signup components
4. Security Review Agent: Audit for common vulnerabilities
5. Test Generation Agent: Write integration tests

Orchestrate these agents to build a complete auth system.
"""

# North Mini Code response demonstrates:
# - Planning coordination between agents
# - Passing outputs from one agent to next
# - Validating intermediate results
# - Error recovery when sub-agent fails
# - Final integration of all components

Why This Matters:

Modern agentic coding involves teams of specialized agents
Main agent must understand dependencies and sequencing
Requires reasoning about agent capabilities and outputs
Critical for scaling to complex software projects

Systems Architecture Mapping

Capability: Analyze existing systems and map their architecture

Example Use Case:

python

prompt = """
Analyze this codebase and provide:
1. High-level architecture diagram (components and relationships)
2. Data flow between modules
3. External dependencies and integrations
4. Potential bottlenecks or anti-patterns
5. Recommendations for refactoring

[Codebase context with 50+ files]
"""

# North Mini Code can:
# - Parse relationships across multiple files
# - Identify architectural patterns (MVC, microservices, etc.)
# - Trace data flow through the system
# - Detect design issues (circular dependencies, tight coupling)
# - Suggest improvements aligned with best practices

Applications:

Legacy Code Understanding: Onboarding to unfamiliar codebases
Refactoring Planning: Identifying modules to redesign
Microservices Migration: Mapping monoliths to service boundaries
Documentation: Auto-generating architecture docs

Code Review Automation

Capability: Perform comprehensive code reviews like a senior engineer

Review Dimensions:

python

code_review_prompt = """
Review this pull request for:

1. Code Quality
   - Readability and maintainability
   - Adherence to style guides
   - Proper naming conventions

2. Correctness
   - Logic errors and edge cases
   - Proper error handling
   - Input validation

3. Performance
   - Algorithmic complexity
   - Resource usage (memory, I/O)
   - Database query optimization

4. Security
   - SQL injection, XSS vulnerabilities
   - Authentication and authorization
   - Sensitive data handling

5. Testing
   - Test coverage adequacy
   - Missing test scenarios
   - Test quality and clarity

[Pull request diff]
"""

North Mini Code Output:

Specific line-by-line comments
Suggested improvements with code examples
Security vulnerability identification
Performance optimization recommendations
Test case suggestions

Deployment Options and Availability

Multiple Access Channels

1. Hugging Face (Weights Download)

Free access to model weights for self-deployment:

bash

# Download via Hugging Face CLI
huggingface-cli download cohere/north-mini-code-1.0 \
  --local-dir ./north-mini-code \
  --local-dir-use-symlinks False

# Load with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "cohere/north-mini-code-1.0",
    device_map="auto",
    torch_dtype="float8"
)

tokenizer = AutoTokenizer.from_pretrained("cohere/north-mini-code-1.0")

# Generate code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

2. Cohere Model Vault

Fully managed inference platform:

python

import cohere

co = cohere.Client(api_key="your-cohere-api-key")

response = co.generate(
    model="north-mini-code-1.0",
    prompt="Create a REST API endpoint for user authentication",
    max_tokens=2048,
    temperature=0.2
)

print(response.generations[0].text)

Benefits:

Zero infrastructure management
Automatic scaling
Built-in monitoring and logging
Optimized inference performance
Pay-per-use pricing

3. Cohere API

Direct API access for integration:

bash

curl https://api.cohere.ai/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "north-mini-code-1.0",
    "prompt": "Implement binary search in Python",
    "max_tokens": 1024,
    "temperature": 0.1
  }'

4. OpenRouter

Multi-provider access platform:

python

import openai

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

completion = client.chat.completions.create(
    model="cohere/north-mini-code-1.0",
    messages=[
        {"role": "system", "content": "You are an expert programmer."},
        {"role": "user", "content": "Write a SQL query to find duplicate emails"}
    ]
)

print(completion.choices[0].message.content)

OpenCode Compatibility

North Mini Code is specifically optimized for the OpenCode agent harness:

What is OpenCode?

Open-source agentic coding framework
Similar architecture to Claude Code, Codex CLI
Supports terminal access, file operations, web search
Extensible via plugins and custom tools

Integration Example:

bash

# Install OpenCode
pip install opencode-agents

# Configure North Mini Code backend
opencode config set model cohere/north-mini-code-1.0
opencode config set api_key YOUR_COHERE_KEY

# Run agentic coding session
opencode run "Build a FastAPI app with SQLite database"

# North Mini Code will:
# 1. Plan the project structure
# 2. Create necessary files
# 3. Write FastAPI endpoints
# 4. Set up SQLite connection
# 5. Generate tests
# 6. Validate everything works

Why OpenCode Matters:

Provides agent harness for real-world tasks
Enables agentic workflows North Mini Code was trained for
Open-source alternative to proprietary coding agents
Community-driven development and extensions

Use Cases and Applications

1. Rapid Prototyping

Scenario: Startup needs to validate product idea quickly

python

task = """
Build a minimal viable product for a task management app:
- REST API (FastAPI)
- SQLite database with tasks table
- CRUD operations (create, read, update, delete tasks)
- Basic authentication
- React frontend with task list and add/edit forms
- Deploy script for Vercel (backend) and Netlify (frontend)
"""

# North Mini Code generates:
# - Complete backend with 5 endpoints
# - Database schema and migrations
# - Frontend with 3 components
# - Integration tests
# - Deployment configurations
# Total: ~2,000 lines of production-ready code in minutes

Time Savings:

Traditional development: 2-3 days
With North Mini Code: 1-2 hours (review + testing)
Savings: 90%+ reduction in initial development time

2. Legacy Codebase Modernization

Scenario: Enterprise migrating from Python 2.7 to Python 3.12

python

task = """
Analyze this Python 2.7 codebase and:
1. Identify compatibility issues with Python 3.12
2. Generate migration plan with risk assessment
3. Refactor critical modules first (auth, database)
4. Update tests to pass with Python 3.12
5. Document breaking changes and migration steps
"""

# North Mini Code:
# - Scans 50,000+ lines of legacy code
# - Identifies 200+ compatibility issues
# - Prioritizes 15 critical modules
# - Generates refactored code with Python 3.12 features
# - Creates comprehensive migration documentation

Benefits:

Handles tedious but critical migration work
Identifies hidden dependencies and issues
Suggests modern Python patterns and features
Reduces manual migration errors

3. Security Audit Automation

Scenario: Security team needs to audit microservices for vulnerabilities

python

task = """
Audit these 10 microservices for security issues:
- SQL injection vulnerabilities
- XSS attack vectors
- Authentication bypass possibilities
- Authorization logic flaws
- Sensitive data exposure
- CSRF vulnerabilities
- Dependency vulnerabilities (outdated packages)

Provide:
- Severity ratings (Critical/High/Medium/Low)
- Exploit scenarios
- Remediation code examples
- Priority order for fixes
"""

# North Mini Code generates:
# - Comprehensive security report
# - 23 identified vulnerabilities across services
# - PoC exploit code for critical issues
# - Fixed code samples for each vulnerability
# - Dependency upgrade recommendations

Impact:

Automates first-pass security review
Identifies issues human reviewers might miss
Provides actionable remediation guidance
Scales security reviews across large codebases

4. Test Suite Generation

Scenario: Open-source project lacks comprehensive tests

python

task = """
Generate comprehensive test suite for this library:
- Unit tests for all public functions
- Integration tests for workflows
- Edge case coverage (null inputs, boundary conditions)
- Performance regression tests
- Mocking for external dependencies
- Test fixtures and helpers
- Achieve >90% code coverage
"""

# North Mini Code creates:
# - 150+ test cases covering all modules
# - Pytest fixtures for common test scenarios
# - Mock implementations for external APIs
# - Property-based tests using Hypothesis
# - Performance benchmarks
# - CI/CD integration (GitHub Actions)

Value:

Improves code quality through comprehensive testing
Catches regressions before deployment
Documents expected behavior through tests
Reduces manual test writing time by 80%+

Sovereign AI and On-Premises Deployment

Why Sovereign AI Matters

Definition: Sovereign AI means owning and controlling your AI infrastructure—models, data, and deployment—without dependencies on external providers.

Key Principles:

Data Sovereignty: Training data and inference queries stay on-premises
Model Control: Full access to model weights, no black boxes
Deployment Flexibility: Run anywhere (on-prem, private cloud, edge)
No Vendor Lock-in: Switch providers or self-host without migration pain

North Mini Code's Sovereign Advantages

1. Apache 2.0 License

snippet

Permissions:
✅ Commercial use
✅ Modification and derivatives
✅ Distribution
✅ Private use
✅ Patent use

Conditions:
- Include original license and copyright notice
- State changes if you modify the code

Limitations:
❌ No liability
❌ No warranty
❌ Trademark use restrictions

What This Means:

Deploy commercially without licensing fees
Modify model for domain-specific needs
Create proprietary derivatives (closed-source OK)
No usage restrictions or rate limits

2. Self-Hosting on Modest Hardware

Minimum Requirements:

snippet

GPU: 1× NVIDIA H100 (80GB VRAM)
Precision: FP8 (8-bit floating point)
Memory: ~15GB VRAM for model weights + 10GB for KV cache
Disk: ~30GB for model storage

Cost:
Cloud (AWS p5.2xlarge): ~$10-15/hour
On-prem H100 server: ~$35,000 one-time (amortized over years)

Cost Comparison (1 year, heavy usage):

Option	Setup Cost	Running Cost (1 year)	Total
OpenAI API	$0	$50,000+	$50,000+
Cohere API	$0	$30,000+	$30,000+
Cloud H100	$0	$87,600 (24/7)	$87,600
On-Prem H100	$35,000	$5,000 (power + maintenance)	$40,000

Assumes heavy inference workload (1M tokens/day output)

3. Air-Gapped Deployment

For environments requiring complete network isolation:

bash

# Offline deployment workflow
# 1. Download model on internet-connected machine
huggingface-cli download cohere/north-mini-code-1.0 \
  --local-dir ./offline-model

# 2. Transfer to air-gapped environment (USB, secure transfer)
tar -czf north-mini-code.tar.gz ./offline-model
# ... physical transfer ...

# 3. Extract and deploy on air-gapped server
tar -xzf north-mini-code.tar.gz
python deploy_offline.py --model-path ./offline-model

# 4. Inference runs completely offline
# - No internet connectivity required
# - No telemetry or usage tracking
# - Complete data isolation

Use Cases:

Government and defense contractors
Healthcare (HIPAA compliance)
Financial institutions (regulatory requirements)
Trade secret protection (IP-sensitive companies)

Comparison: Sovereign vs. API-Based Models

Feature	North Mini Code (Sovereign)	GPT-5.5 / Claude Fable (API)
Data Privacy	Complete on-prem control	Data sent to provider
Customization	Full model access	Prompt-only customization
Cost Model	One-time hardware + power	Per-token pricing
Latency	Local (ms)	Network-dependent (100ms+)
Availability	Always (local)	Provider uptime-dependent
Compliance	Easier (data stays local)	Complex (third-party processors)
Rate Limits	None (your hardware)	Provider-imposed limits
Model Updates	Your choice when to update	Provider-controlled updates

Limitations and Considerations

1. Capability Gap vs. Frontier Models

While North Mini Code is competitive among similarly-sized models, it lags behind large frontier models:

Model	Size	SWE-Bench Verified (est.)	Note
GPT-5.5	Proprietary	~70%+	State-of-the-art
Claude Fable 5	Proprietary	~65%+	Excellent reasoning
DeepSeek V4 Pro	236B MoE (21B active)	~55-60%	Large open model
North Mini Code	30B MoE (3B active)	~25-35% (est.)	Small, efficient

Trade-off:

✅ Efficiency and cost vs. ⚠️ Lower absolute performance
✅ Self-hosting flexibility vs. ⚠️ Capability limitations

When to Choose North Mini Code:

Task complexity is moderate (not cutting-edge research)
Data sovereignty is critical
Cost optimization is priority
Local deployment is required

When to Choose Frontier Models:

Maximum capability needed
Complex reasoning required
Cost is secondary to performance
Cloud deployment acceptable

2. MoE Router Overhead

Mixture-of-Experts Trade-offs:

Benefits:

✅ Only 3B active parameters (10% of total)
✅ Faster inference than 30B dense model
✅ Specialist experts for different patterns

Drawbacks:

⚠️ Routing overhead (selecting experts)
⚠️ TTFT slightly slower than dense models
⚠️ Load imbalance if routing is skewed
⚠️ Memory still needed for full 30B model

Mitigation:

Optimize for long completions where throughput dominates
Use batching to amortize routing overhead
Monitor expert utilization for balanced routing

3. OpenCode Ecosystem Maturity

North Mini Code is optimized for OpenCode, but the OpenCode ecosystem is still maturing:

Current State:

✅ Core functionality stable
✅ Compatible with major coding agents
⚠️ Limited plugin ecosystem vs. Claude Code/Codex
⚠️ Smaller community than proprietary alternatives
⚠️ Documentation still evolving

Workarounds:

North Mini Code also works with other harnesses (LangChain, CrewAI)
API access doesn't require OpenCode
Community is actively developing new tools

4. Benchmark Transparency

Cohere reports "competitive scores" without disclosing exact percentages:

What We Know:

✅ Scores 33.4 on Artificial Analysis Coding Index
✅ Internal comparisons show throughput advantages
⚠️ Exact SWE-Bench / Terminal-Bench scores not published
⚠️ Competitor comparisons limited

Why Transparency Matters:

Hard to evaluate true performance without exact numbers
Difficult to compare directly against other models
"Competitive" is subjective and vague

Community Action:

Independent benchmarking underway
Expect third-party evaluations soon
Early reports suggest mid-20s to mid-30s % on SWE-Bench Verified

Comparison to Competing Models

North Mini Code vs. Other Small Open Models

Model	Size	License	Context	SWE-Bench (est.)	Availability
North Mini Code	30B MoE (3B active)	Apache 2.0	256K	~30%	Hugging Face, API
Devstral Small 2	22B	Apache 2.0	128K	~28%	Hugging Face
DeepSeek Coder 7B	7B	MIT	128K	~22%	Hugging Face
Qwen2.5 Coder 32B	32B	Apache 2.0	128K	~35%	Hugging Face
Gemma 4 E4B	27B	Gemma License	128K	~25%	Hugging Face

North Mini Code Advantages:

✅ Larger context window (256K vs. 128K)
✅ Higher throughput (2.8x vs. Devstral)
✅ Optimized for agentic workflows
✅ Strong inter-token latency

Competitor Advantages:

⚠️ Qwen2.5 Coder: Slightly higher absolute scores
⚠️ DeepSeek Coder: Smaller, easier to run
⚠️ Devstral: Better TTFT performance

North Mini Code vs. Frontier Coding Models

Feature	North Mini Code	Claude Fable 5	GPT-5.5 Codex
Deployment	Self-hosted	API only	API only
Cost (1M output tokens)	~$5-10 (self-hosted)	$50	$60
Data Privacy	Complete	Third-party	Third-party
Customization	Full model access	Prompt-level	Prompt-level
Performance	Moderate	Excellent	Excellent
Context	256K	200K	128K

When North Mini Code Wins:

Sovereign deployment requirement
Cost optimization critical
Data can't leave premises
Need customization beyond prompts

When Frontier Models Win:

Maximum capability needed
Cost is secondary concern
Cloud deployment acceptable
Cutting-edge features required

Getting Started with North Mini Code

Quick Start: API Access

1. Get Cohere API Key

bash

# Sign up at https://cohere.ai
# Navigate to API Keys section
# Create new key for North Mini Code

2. Install SDK

bash

pip install cohere

3. Generate Code

python

import cohere

co = cohere.Client("YOUR_API_KEY")

response = co.generate(
    model="north-mini-code-1.0",
    prompt="Write a Python function to calculate Fibonacci numbers",
    max_tokens=512,
    temperature=0.2,
    stop_sequences=["```"]
)

print(response.generations[0].text)

Self-Hosting Guide

Prerequisites:

NVIDIA GPU with 80GB+ VRAM (H100, A100 80GB)
CUDA 12.0+
Python 3.10+

Step 1: Download Model

bash

# Install Hugging Face CLI
pip install huggingface_hub

# Download weights
huggingface-cli download cohere/north-mini-code-1.0 \
  --local-dir ./north-mini-code \
  --include "*.safetensors" "*.json"

Step 2: Install Inference Engine

bash

# Option A: vLLM (recommended for throughput)
pip install vllm

# Option B: TGI (Text Generation Inference)
docker pull ghcr.io/huggingface/text-generation-inference:latest

# Option C: Transformers (simple but slower)
pip install transformers accelerate

Step 3: Launch Server

bash

# Using vLLM
python -m vllm.entrypoints.openai.api_server \
  --model ./north-mini-code \
  --dtype float8 \
  --gpu-memory-utilization 0.9 \
  --max-model-len 256000 \
  --port 8000

# Server runs at http://localhost:8000

Step 4: Query Server

bash

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "north-mini-code",
    "prompt": "def quick_sort(arr):",
    "max_tokens": 512,
    "temperature": 0.1
  }'

OpenCode Integration

Install OpenCode:

bash

pip install opencode-agents

Configure Backend:

bash

# For self-hosted deployment
opencode config set backend http://localhost:8000
opencode config set model north-mini-code

# For Cohere API
opencode config set backend cohere
opencode config set api_key YOUR_COHERE_KEY
opencode config set model north-mini-code-1.0

Run Agentic Task:

bash

# Interactive session
opencode chat

# One-shot task
opencode run "Build a Flask app with user authentication"

# With specific files
opencode run --files src/*.py "Refactor these modules to use async/await"

Community and Ecosystem

Open Development Philosophy

Cohere is building North Mini Code in the open, with community feedback shaping the roadmap:

Community Channels:

🐦 Twitter/X: @CohereAI — Tag @Cohere to share builds
💬 Discord: Join official Cohere Discord server
🗨️ Reddit: r/CohereAI for discussions
🐙 GitHub: Report issues, contribute to ecosystem tools

Feedback Priorities:

Benchmark performance gaps
Real-world use case pain points
Feature requests for future releases
Integration compatibility issues

Roadmap and Future Models

What's Next:

North Mini Code is the first, not the last
Larger models coming: More powerful variants in development
Specialized models: Domain-specific coding models (e.g., ML, frontend, systems)
Fine-tuning support: Official fine-tuning guides and tooling
Quantization options: 4-bit, 2-bit variants for edge deployment

Community-Driven Priorities:

Most-requested benchmarks will be prioritized
Integration gaps will be addressed
Documentation expanded based on common questions

Contributing to the Ecosystem

How to Help:

Benchmark and Report: Run North Mini Code on your tasks, share results
Build Tools: Create OpenCode plugins, integrations, workflows
Write Guides: Tutorial content for common use cases
Identify Issues: Report bugs, limitations, compatibility problems
Showcase Projects: Share what you build to inspire others

Recognition:

Featured projects highlighted on Cohere blog
Community contributors credited in updates
Ecosystem tools promoted through official channels

Sources and References

Official Resources

Announcement:

Cohere Blog: North Mini Code Launch
Published: June 9, 2026

Technical Documentation:

Benchmarks:

Artificial Analysis Coding Index: 33.4
Internal testing vs. Devstral Small 2 (throughput, latency)
Evaluations on SWE-Bench, Terminal-Bench (harness-specific)

North Mini Code was launched by Cohere on June 9, 2026 as their first open-source agentic coding model under Apache 2.0 license. A 30B parameter mixture-of-experts model with 3B active parameters, it delivers competitive performance on SWE-Bench and Terminal-Bench while offering 2.8x throughput advantages over similar-sized models, positioning it as a sovereign AI solution for developers requiring on-premises deployment and complete control over their AI infrastructure.

Related posts

LongCat-2.0: Meituan's 1.6T MoE Open Model Trained on AI ASIC Superpods

Kimi K2.7-Code: Moonshot AI's 1T-Parameter Open Coding Powerhouse

What Is Kilo Code? Open-Source AI Agent for VS Code, JetBrains, and CLI

Cohere's Entry into Open-Source Developer Models

The Sovereign AI Positioning

Technical Specifications

Model Architecture

Mixture-of-Experts Efficiency

Context Window and Generation

Performance Benchmarks

Agentic Software Engineering Scores

What These Benchmarks Measure

Speed and Efficiency Advantages

Output Throughput: 2.8x Advantage

Inter-Token Latency Improvements

Time-to-First-Token Trade-off

Agentic Coding Capabilities

Sub-Agent Orchestration

Systems Architecture Mapping

Code Review Automation

Deployment Options and Availability

Multiple Access Channels

OpenCode Compatibility

Use Cases and Applications

1. Rapid Prototyping

2. Legacy Codebase Modernization

3. Security Audit Automation

4. Test Suite Generation

Sovereign AI and On-Premises Deployment

Why Sovereign AI Matters

North Mini Code's Sovereign Advantages

Comparison: Sovereign vs. API-Based Models

Limitations and Considerations

1. Capability Gap vs. Frontier Models

2. MoE Router Overhead

3. OpenCode Ecosystem Maturity

4. Benchmark Transparency

Comparison to Competing Models

North Mini Code vs. Other Small Open Models

North Mini Code vs. Frontier Coding Models

Getting Started with North Mini Code

Quick Start: API Access

Self-Hosting Guide

OpenCode Integration

Community and Ecosystem

Open Development Philosophy

Roadmap and Future Models

Contributing to the Ecosystem

Sources and References

Official Resources

Related Reading