Cohere North Mini Code: Open-Source Agentic Coding Model (Apache 2.0)
Cohere launches North Mini Code, a 30B parameter MoE model (3B active) built for agentic software engineering, achieving 2.8x throughput vs Devstral Small 2 under Apache 2.0 license.
TL;DR: On June 9, 2026, Cohere launched North Mini Code—their first open-source agentic coding model under Apache 2.0 license. A 30B parameter mixture-of-experts (MoE) architecture with just 3B active parameters, North Mini Code delivers competitive performance on SWE-Bench Verified, SWE-Bench Pro, and Terminal-Bench 2.0 benchmarks while achieving up to 2.8x higher output throughput than Devstral Small 2. Built for sovereign AI deployment, it supports 256K context, requires just 1× H100 GPU at FP8, and is optimized for agentic software engineering workflows including sub-agent orchestration, systems architecture mapping, and code reviews.
Cohere's Entry into Open-Source Developer Models
After establishing itself with enterprise-focused models like Command A+, Cohere is expanding into the developer ecosystem with North Mini Code—the first model in their next generation of powerful, open-source AI systems.
The Sovereign AI Positioning
Core Mission: Enable developers to deploy agentic coding capabilities on their own terms—on-premises, locally, or in private clouds—without vendor lock-in or usage restrictions.
Learn more about loop engineering
Market Context:
Closed Ecosystem: GPT-5.5, Claude Fable 5, Gemini 3.5 require API access
Open Alternatives: DeepSeek V4 Pro, Qwen, Llama 4 offer self-hosting
Gap: Limited small, efficient open models optimized for agentic coding
North Mini Code's Position:
graph TD
A[Developer Needs] --> B{Deployment Preference?}
B -->|Cloud API| C[GPT-5.5, Claude Fable 5]
B -->|Self-Hosted| D[North Mini Code, Qwen, DeepSeek]
D --> E{Size Constraint?}
E -->|Large OK| F[DeepSeek V4 Pro 236B]
E -->|Small Preferred| G[North Mini Code 30B MoE]
G --> H[3B Active - Efficient]
Cohere conducted internal performance testing comparing North Mini Code to Devstral Small 2 (Mistral's small coding model) under identical hardware and concurrency conditions.
Output Throughput: 2.8x Advantage
Test Configuration:
Hardware: Identical GPU setup (1× H100)
Concurrency: High and low concurrency levels tested
Workload: Real-world coding prompts
Precision: FP8 for both models
Results:
Metric
North Mini Code
Devstral Small 2
Advantage
Output Throughput (High Concurrency)
2.8x baseline
1.0x baseline
+180%
Output Throughput (Low Concurrency)
2.5x baseline
1.0x baseline
+150%
Inter-Token Latency
30% lower
Baseline
+30%
Time-to-First-Token (TTFT)
Slightly slower
Baseline
-5%
Practical Implications:
Example: Generating a 1,000-token code file
Devstral Small 2: ~10 seconds
North Mini Code: ~3.5 seconds (2.8x faster)
In a development session generating 10 files:
Devstral Small 2: 100 seconds
North Mini Code: 35 seconds (saves 65 seconds)
Inter-Token Latency Improvements
Inter-Token Latency measures the consistency and pacing of token generation—critical for smooth streaming and user experience.
30% Improvement:
Smoother Streaming: More consistent token delivery
Better UX: Reduces perceived "stuttering" during generation
Predictable Performance: More reliable latency characteristics
Higher Throughput: Less time waiting between tokens accumulates
Time-to-First-Token Trade-off
TTFT (Time-to-First-Token):
North Mini Code: Slightly slower than Devstral Small 2 (~5% difference)
Devstral Small 2: Maintains edge in prompt processing speed
Why the Trade-off Exists:
MoE Architecture Impact:
- Routing overhead: Selecting which experts to activate
- Sparse activation: Coordinating distributed experts
- Offset by: Massive throughput gains during generation
Net Result: Slightly slower to start, much faster to complete
When It Matters:
⚠️ Short Completions: TTFT dominates total time
✅ Long Completions: Throughput gains overwhelm TTFT delay
North Mini Code is specifically optimized for agentic software engineering workflows—multi-step, autonomous coding tasks that go beyond simple code completion.
Sub-Agent Orchestration
Capability: Understand and coordinate multiple specialized sub-agents
Example Workflow:
# North Mini Code orchestrating sub-agents
main_agent_prompt = """
Task: Implement user authentication system
Sub-agents to coordinate:
1. Database Schema Agent: Design user tables and indexes
2. API Endpoint Agent: Create REST endpoints for auth
3. Frontend Form Agent: Build login/signup components
4. Security Review Agent: Audit for common vulnerabilities
5. Test Generation Agent: Write integration tests
Orchestrate these agents to build a complete auth system.
"""# North Mini Code response demonstrates:# - Planning coordination between agents# - Passing outputs from one agent to next# - Validating intermediate results# - Error recovery when sub-agent fails# - Final integration of all components
Why This Matters:
Modern agentic coding involves teams of specialized agents
Main agent must understand dependencies and sequencing
Requires reasoning about agent capabilities and outputs
Critical for scaling to complex software projects
Systems Architecture Mapping
Capability: Analyze existing systems and map their architecture
Example Use Case:
prompt = """
Analyze this codebase and provide:
1. High-level architecture diagram (components and relationships)
2. Data flow between modules
3. External dependencies and integrations
4. Potential bottlenecks or anti-patterns
5. Recommendations for refactoring
[Codebase context with 50+ files]
"""# North Mini Code can:# - Parse relationships across multiple files# - Identify architectural patterns (MVC, microservices, etc.)# - Trace data flow through the system# - Detect design issues (circular dependencies, tight coupling)# - Suggest improvements aligned with best practices
Applications:
Legacy Code Understanding: Onboarding to unfamiliar codebases
Refactoring Planning: Identifying modules to redesign
Microservices Migration: Mapping monoliths to service boundaries
Documentation: Auto-generating architecture docs
Code Review Automation
Capability: Perform comprehensive code reviews like a senior engineer
Review Dimensions:
code_review_prompt = """
Review this pull request for:
1. Code Quality
- Readability and maintainability
- Adherence to style guides
- Proper naming conventions
2. Correctness
- Logic errors and edge cases
- Proper error handling
- Input validation
3. Performance
- Algorithmic complexity
- Resource usage (memory, I/O)
- Database query optimization
4. Security
- SQL injection, XSS vulnerabilities
- Authentication and authorization
- Sensitive data handling
5. Testing
- Test coverage adequacy
- Missing test scenarios
- Test quality and clarity
[Pull request diff]
"""
North Mini Code Output:
Specific line-by-line comments
Suggested improvements with code examples
Security vulnerability identification
Performance optimization recommendations
Test case suggestions
Deployment Options and Availability
Multiple Access Channels
1. Hugging Face (Weights Download)
Free access to model weights for self-deployment:
# Download via Hugging Face CLI
huggingface-cli download cohere/north-mini-code-1.0 \
--local-dir ./north-mini-code \
--local-dir-use-symlinks False
# Load with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"cohere/north-mini-code-1.0",
device_map="auto",
torch_dtype="float8"
)
tokenizer = AutoTokenizer.from_pretrained("cohere/north-mini-code-1.0")
# Generate code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
2. Cohere Model Vault
Fully managed inference platform:
import cohere
co = cohere.Client(api_key="your-cohere-api-key")
response = co.generate(
model="north-mini-code-1.0",
prompt="Create a REST API endpoint for user authentication",
max_tokens=2048,
temperature=0.2
)
print(response.generations[0].text)
import openai
client = openai.OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
completion = client.chat.completions.create(
model="cohere/north-mini-code-1.0",
messages=[
{"role": "system", "content": "You are an expert programmer."},
{"role": "user", "content": "Write a SQL query to find duplicate emails"}
]
)
print(completion.choices[0].message.content)
OpenCode Compatibility
North Mini Code is specifically optimized for the OpenCode agent harness:
What is OpenCode?
Open-source agentic coding framework
Similar architecture to Claude Code, Codex CLI
Supports terminal access, file operations, web search
Extensible via plugins and custom tools
Integration Example:
# Install OpenCode
pip install opencode-agents
# Configure North Mini Code backend
opencode config set model cohere/north-mini-code-1.0
opencode config set api_key YOUR_COHERE_KEY
# Run agentic coding session
opencode run "Build a FastAPI app with SQLite database"# North Mini Code will:# 1. Plan the project structure# 2. Create necessary files# 3. Write FastAPI endpoints# 4. Set up SQLite connection# 5. Generate tests# 6. Validate everything works
Enables agentic workflows North Mini Code was trained for
Open-source alternative to proprietary coding agents
Community-driven development and extensions
Use Cases and Applications
1. Rapid Prototyping
Scenario: Startup needs to validate product idea quickly
task = """
Build a minimal viable product for a task management app:
- REST API (FastAPI)
- SQLite database with tasks table
- CRUD operations (create, read, update, delete tasks)
- Basic authentication
- React frontend with task list and add/edit forms
- Deploy script for Vercel (backend) and Netlify (frontend)
"""# North Mini Code generates:# - Complete backend with 5 endpoints# - Database schema and migrations# - Frontend with 3 components# - Integration tests# - Deployment configurations# Total: ~2,000 lines of production-ready code in minutes
Time Savings:
Traditional development: 2-3 days
With North Mini Code: 1-2 hours (review + testing)
Savings: 90%+ reduction in initial development time
2. Legacy Codebase Modernization
Scenario: Enterprise migrating from Python 2.7 to Python 3.12
task = """
Analyze this Python 2.7 codebase and:
1. Identify compatibility issues with Python 3.12
2. Generate migration plan with risk assessment
3. Refactor critical modules first (auth, database)
4. Update tests to pass with Python 3.12
5. Document breaking changes and migration steps
"""# North Mini Code:# - Scans 50,000+ lines of legacy code# - Identifies 200+ compatibility issues# - Prioritizes 15 critical modules# - Generates refactored code with Python 3.12 features# - Creates comprehensive migration documentation
Benefits:
Handles tedious but critical migration work
Identifies hidden dependencies and issues
Suggests modern Python patterns and features
Reduces manual migration errors
3. Security Audit Automation
Scenario: Security team needs to audit microservices for vulnerabilities
task = """
Audit these 10 microservices for security issues:
- SQL injection vulnerabilities
- XSS attack vectors
- Authentication bypass possibilities
- Authorization logic flaws
- Sensitive data exposure
- CSRF vulnerabilities
- Dependency vulnerabilities (outdated packages)
Provide:
- Severity ratings (Critical/High/Medium/Low)
- Exploit scenarios
- Remediation code examples
- Priority order for fixes
"""# North Mini Code generates:# - Comprehensive security report# - 23 identified vulnerabilities across services# - PoC exploit code for critical issues# - Fixed code samples for each vulnerability# - Dependency upgrade recommendations
task = """
Generate comprehensive test suite for this library:
- Unit tests for all public functions
- Integration tests for workflows
- Edge case coverage (null inputs, boundary conditions)
- Performance regression tests
- Mocking for external dependencies
- Test fixtures and helpers
- Achieve >90% code coverage
"""# North Mini Code creates:# - 150+ test cases covering all modules# - Pytest fixtures for common test scenarios# - Mock implementations for external APIs# - Property-based tests using Hypothesis# - Performance benchmarks# - CI/CD integration (GitHub Actions)
Value:
Improves code quality through comprehensive testing
Catches regressions before deployment
Documents expected behavior through tests
Reduces manual test writing time by 80%+
Sovereign AI and On-Premises Deployment
Why Sovereign AI Matters
Definition: Sovereign AI means owning and controlling your AI infrastructure—models, data, and deployment—without dependencies on external providers.
Key Principles:
Data Sovereignty: Training data and inference queries stay on-premises
Model Control: Full access to model weights, no black boxes
Deployment Flexibility: Run anywhere (on-prem, private cloud, edge)
No Vendor Lock-in: Switch providers or self-host without migration pain
North Mini Code's Sovereign Advantages
1. Apache 2.0 License
Permissions:
✅ Commercial use
✅ Modification and derivatives
✅ Distribution
✅ Private use
✅ Patent use
Conditions:
- Include original license and copyright notice
- State changes if you modify the code
Limitations:
❌ No liability
❌ No warranty
❌ Trademark use restrictions
What This Means:
Deploy commercially without licensing fees
Modify model for domain-specific needs
Create proprietary derivatives (closed-source OK)
No usage restrictions or rate limits
2. Self-Hosting on Modest Hardware
Minimum Requirements:
GPU: 1× NVIDIA H100 (80GB VRAM)
Precision: FP8 (8-bit floating point)
Memory: ~15GB VRAM for model weights + 10GB for KV cache
Disk: ~30GB for model storage
Cost:
Cloud (AWS p5.2xlarge): ~$10-15/hour
On-prem H100 server: ~$35,000 one-time (amortized over years)
Cost Comparison (1 year, heavy usage):
Option
Setup Cost
Running Cost (1 year)
Total
OpenAI API
$0
$50,000+
$50,000+
Cohere API
$0
$30,000+
$30,000+
Cloud H100
$0
$87,600 (24/7)
$87,600
On-Prem H100
$35,000
$5,000 (power + maintenance)
$40,000
Assumes heavy inference workload (1M tokens/day output)
3. Air-Gapped Deployment
For environments requiring complete network isolation:
# Offline deployment workflow# 1. Download model on internet-connected machine
huggingface-cli download cohere/north-mini-code-1.0 \
--local-dir ./offline-model
# 2. Transfer to air-gapped environment (USB, secure transfer)
tar -czf north-mini-code.tar.gz ./offline-model
# ... physical transfer ...# 3. Extract and deploy on air-gapped server
tar -xzf north-mini-code.tar.gz
python deploy_offline.py --model-path ./offline-model
# 4. Inference runs completely offline# - No internet connectivity required# - No telemetry or usage tracking# - Complete data isolation
Use Cases:
Government and defense contractors
Healthcare (HIPAA compliance)
Financial institutions (regulatory requirements)
Trade secret protection (IP-sensitive companies)
Comparison: Sovereign vs. API-Based Models
Feature
North Mini Code (Sovereign)
GPT-5.5 / Claude Fable (API)
Data Privacy
Complete on-prem control
Data sent to provider
Customization
Full model access
Prompt-only customization
Cost Model
One-time hardware + power
Per-token pricing
Latency
Local (ms)
Network-dependent (100ms+)
Availability
Always (local)
Provider uptime-dependent
Compliance
Easier (data stays local)
Complex (third-party processors)
Rate Limits
None (your hardware)
Provider-imposed limits
Model Updates
Your choice when to update
Provider-controlled updates
Limitations and Considerations
1. Capability Gap vs. Frontier Models
While North Mini Code is competitive among similarly-sized models, it lags behind large frontier models:
Model
Size
SWE-Bench Verified (est.)
Note
GPT-5.5
Proprietary
~70%+
State-of-the-art
Claude Fable 5
Proprietary
~65%+
Excellent reasoning
DeepSeek V4 Pro
236B MoE (21B active)
~55-60%
Large open model
North Mini Code
30B MoE (3B active)
~25-35% (est.)
Small, efficient
Trade-off:
✅ Efficiency and cost vs. ⚠️ Lower absolute performance
✅ Self-hosting flexibility vs. ⚠️ Capability limitations
When to Choose North Mini Code:
Task complexity is moderate (not cutting-edge research)
Data sovereignty is critical
Cost optimization is priority
Local deployment is required
When to Choose Frontier Models:
Maximum capability needed
Complex reasoning required
Cost is secondary to performance
Cloud deployment acceptable
2. MoE Router Overhead
Mixture-of-Experts Trade-offs:
Benefits:
✅ Only 3B active parameters (10% of total)
✅ Faster inference than 30B dense model
✅ Specialist experts for different patterns
Drawbacks:
⚠️ Routing overhead (selecting experts)
⚠️ TTFT slightly slower than dense models
⚠️ Load imbalance if routing is skewed
⚠️ Memory still needed for full 30B model
Mitigation:
Optimize for long completions where throughput dominates
Use batching to amortize routing overhead
Monitor expert utilization for balanced routing
3. OpenCode Ecosystem Maturity
North Mini Code is optimized for OpenCode, but the OpenCode ecosystem is still maturing:
Current State:
✅ Core functionality stable
✅ Compatible with major coding agents
⚠️ Limited plugin ecosystem vs. Claude Code/Codex
⚠️ Smaller community than proprietary alternatives
⚠️ Documentation still evolving
Workarounds:
North Mini Code also works with other harnesses (LangChain, CrewAI)
API access doesn't require OpenCode
Community is actively developing new tools
4. Benchmark Transparency
Cohere reports "competitive scores" without disclosing exact percentages:
What We Know:
✅ Scores 33.4 on Artificial Analysis Coding Index
✅ Internal comparisons show throughput advantages
⚠️ Exact SWE-Bench / Terminal-Bench scores not published
⚠️ Competitor comparisons limited
Why Transparency Matters:
Hard to evaluate true performance without exact numbers
Difficult to compare directly against other models
"Competitive" is subjective and vague
Community Action:
Independent benchmarking underway
Expect third-party evaluations soon
Early reports suggest mid-20s to mid-30s % on SWE-Bench Verified
Comparison to Competing Models
North Mini Code vs. Other Small Open Models
Model
Size
License
Context
SWE-Bench (est.)
Availability
North Mini Code
30B MoE (3B active)
Apache 2.0
256K
~30%
Hugging Face, API
Devstral Small 2
22B
Apache 2.0
128K
~28%
Hugging Face
DeepSeek Coder 7B
7B
MIT
128K
~22%
Hugging Face
Qwen2.5 Coder 32B
32B
Apache 2.0
128K
~35%
Hugging Face
Gemma 4 E4B
27B
Gemma License
128K
~25%
Hugging Face
North Mini Code Advantages:
✅ Larger context window (256K vs. 128K)
✅ Higher throughput (2.8x vs. Devstral)
✅ Optimized for agentic workflows
✅ Strong inter-token latency
Competitor Advantages:
⚠️ Qwen2.5 Coder: Slightly higher absolute scores
⚠️ DeepSeek Coder: Smaller, easier to run
⚠️ Devstral: Better TTFT performance
North Mini Code vs. Frontier Coding Models
Feature
North Mini Code
Claude Fable 5
GPT-5.5 Codex
Deployment
Self-hosted
API only
API only
Cost (1M output tokens)
~$5-10 (self-hosted)
$50
$60
Data Privacy
Complete
Third-party
Third-party
Customization
Full model access
Prompt-level
Prompt-level
Performance
Moderate
Excellent
Excellent
Context
256K
200K
128K
When North Mini Code Wins:
Sovereign deployment requirement
Cost optimization critical
Data can't leave premises
Need customization beyond prompts
When Frontier Models Win:
Maximum capability needed
Cost is secondary concern
Cloud deployment acceptable
Cutting-edge features required
Getting Started with North Mini Code
Quick Start: API Access
1. Get Cohere API Key
# Sign up at https://cohere.ai# Navigate to API Keys section# Create new key for North Mini Code
2. Install SDK
pip install cohere
3. Generate Code
import cohere
co = cohere.Client("YOUR_API_KEY")
response = co.generate(
model="north-mini-code-1.0",
prompt="Write a Python function to calculate Fibonacci numbers",
max_tokens=512,
temperature=0.2,
stop_sequences=["```"]
)
print(response.generations[0].text)
# For self-hosted deployment
opencode config set backend http://localhost:8000
opencode config set model north-mini-code
# For Cohere API
opencode config set backend cohere
opencode config set api_key YOUR_COHERE_KEY
opencode config set model north-mini-code-1.0
Run Agentic Task:
# Interactive session
opencode chat
# One-shot task
opencode run "Build a Flask app with user authentication"# With specific files
opencode run --files src/*.py "Refactor these modules to use async/await"
Community and Ecosystem
Open Development Philosophy
Cohere is building North Mini Code in the open, with community feedback shaping the roadmap:
Community Channels:
🐦 Twitter/X:@CohereAI — Tag @Cohere to share builds
💬 Discord: Join official Cohere Discord server
🗨️ Reddit: r/CohereAI for discussions
🐙 GitHub: Report issues, contribute to ecosystem tools
Feedback Priorities:
Benchmark performance gaps
Real-world use case pain points
Feature requests for future releases
Integration compatibility issues
Roadmap and Future Models
What's Next:
North Mini Code is the first, not the last
Larger models coming: More powerful variants in development
North Mini Code was launched by Cohere on June 9, 2026 as their first open-source agentic coding model under Apache 2.0 license. A 30B parameter mixture-of-experts model with 3B active parameters, it delivers competitive performance on SWE-Bench and Terminal-Bench while offering 2.8x throughput advantages over similar-sized models, positioning it as a sovereign AI solution for developers requiring on-premises deployment and complete control over their AI infrastructure.