TL;DR: On June 9, 2026, Cohere launched North Mini Code—their first open-source agentic coding model under Apache 2.0 license. A 30B parameter mixture-of-experts (MoE) architecture with just 3B active parameters, North Mini Code delivers competitive performance on SWE-Bench Verified, SWE-Bench Pro, and Terminal-Bench 2.0 benchmarks while achieving up to 2.8x higher output throughput than Devstral Small 2. Built for sovereign AI deployment, it supports 256K context, requires just 1× H100 GPU at FP8, and is optimized for agentic software engineering workflows including sub-agent orchestration, systems architecture mapping, and code reviews.
Cohere's Entry into Open-Source Developer Models
After establishing itself with enterprise-focused models like Command A+, Cohere is expanding into the developer ecosystem with North Mini Code—the first model in their next generation of powerful, open-source AI systems.
The Sovereign AI Positioning
Core Mission: Enable developers to deploy agentic coding capabilities on their own terms—on-premises, locally, or in private clouds—without vendor lock-in or usage restrictions.
Market Context:
- Closed Ecosystem: GPT-5.5, Claude Fable 5, Gemini 3.5 require API access
- Open Alternatives: DeepSeek V4 Pro, Qwen, Llama 4 offer self-hosting
- Gap: Limited small, efficient open models optimized for agentic coding
North Mini Code's Position:
graph TD
A[Developer Needs] --> B{Deployment Preference?}
B -->|Cloud API| C[GPT-5.5, Claude Fable 5]
B -->|Self-Hosted| D[North Mini Code, Qwen, DeepSeek]
D --> E{Size Constraint?}
E -->|Large OK| F[DeepSeek V4 Pro 236B]
E -->|Small Preferred| G[North Mini Code 30B MoE]
G --> H[3B Active - Efficient]
Technical Specifications
Model Architecture
North Mini Code at a Glance:
| Specification | Details |
|---|---|
| Model Type | Mixture-of-Experts (MoE) |
| Total Parameters | 30B |
| Active Parameters | 3B (per forward pass) |
| License | Apache 2.0 |
| Context Length | 256K tokens total |
| Max Generation | 64K tokens |
| Precision | FP8 (recommended) |
| Hardware Minimum | 1× H100 GPU |
| Training Focus | Code generation, agentic workflows, terminal tasks |
Mixture-of-Experts Efficiency
How MoE Works:
# Conceptual MoE forward pass
def moe_forward(input_tokens):
# Route each token to specialized expert
routing_weights = router(input_tokens) # Learned routing
# Only activate top-k experts per token
active_experts = select_top_k(routing_weights, k=2)
# Process with sparse activation
output = sum([
expert(tokens) * weight
for expert, weight in active_experts
])
return output # Only ~3B params active, not 30B
Benefits:
- ✅ Efficiency: Only 10% of parameters active per forward pass
- ✅ Speed: Faster inference than 30B dense model
- ✅ Specialization: Different experts for different coding patterns
- ✅ Cost: Reduced memory and compute requirements
Complete AI Builder Bootcamp
Claude, Python automation & full-stack — 12 live sessions with Yash Thakker.
The Complete AI Builder Bootcamp is the best AI development course for learning Claude AI, prompt engineering, Python automation, and full-stack web development. This intensive 6-week live bootcamp teaches you how to build AI-powered applications using Claude Projects, Claude Artifacts, Claude Code, and the complete Claude ecosystem. You'll master prompt engineering techniques, learn to create custom Claude connectors and MCP integrations, build Python automation workflows, develop full-stack websites with AI assistance, and create AI marketing agents.
The bootcamp includes 12 live Zoom sessions with Yash Thakker, founder of AISOLO Technologies and instructor to 350,000+ students. You'll build 8+ portfolio projects including AI playbooks, full-stack note-taking applications, Python automation scripts, marketing agents, and personal portfolio websites. The curriculum covers AI fundamentals, Claude Projects and Artifacts, Claude Co-work, Claude plugins and skills, Claude Code for Python development, full-stack development, AI marketing, and capstone projects.
Students receive 1-year access to all recordings, permanent Discord community access, a certificate of completion, and personalized career guidance. All enrollments include a 7-day money-back guarantee. This is the most comprehensive Claude AI bootcamp available, taking students from zero AI knowledge to expert AI builder in 6 weeks.
Context Window and Generation
256K Context Window:
- Entire Codebases: Fit large projects in context
- Multi-File Reasoning: Understand dependencies across files
- Long Debugging Sessions: Maintain context through extended interactions
- Documentation: Include extensive API references and docs
64K Maximum Generation:
- Complete Modules: Generate full feature implementations
- Comprehensive Refactors: Rewrite large sections of code
- Detailed Explanations: Provide thorough code walkthroughs
- Test Suites: Generate extensive test coverage
Comparison:
| Model | Context Window | Max Generation |
|---|---|---|
| North Mini Code | 256K | 64K |
| GPT-4o | 128K | 16K |
| Claude 3.5 Sonnet | 200K | 8K |
| Devstral Small 2 | 128K | 32K |
| DeepSeek V4 Coder | 128K | 8K |
Performance Benchmarks
Agentic Software Engineering Scores
Cohere evaluated North Mini Code on industry-standard benchmarks for agentic coding capabilities:
SWE-Bench Performance:
| Benchmark | North Mini Code | Competitor Range* |
|---|---|---|
| SWE-Bench Verified | Competitive** | 15-35% (similar size) |
| SWE-Bench Pro | Competitive** | 10-25% (similar size) |
| Terminal-Bench v2 | Competitive** | 35-55% (similar size) |
| Terminal-Bench Hard | Competitive** | 20-40% (similar size) |
*Competitor range includes Gemma 4 E4B, DeepSeek Coder 7B, Qwen2.5 Coder 32B, and similar-sized models **Cohere reports "competitive scores" without disclosing exact percentages
Artificial Analysis Coding Index:
- Score: 33.4
- Interpretation: Aggregated performance across coding benchmarks
- Positioning: Competitive among similarly sized open-source models
What These Benchmarks Measure
SWE-Bench Verified:
- Real GitHub issues from popular repositories
- Requires generating patches that pass existing test suites
- Tests understanding of complex codebases
- Measures production-ready code generation
- 89 carefully curated tasks across diverse domains
- Multi-step terminal workflows
- System administration, ML, security, biology tasks
- Tests agentic planning and tool use
Terminal-Bench Hard:
- More challenging subset of Terminal-Bench tasks
- Requires advanced reasoning and error recovery
- Tests long-horizon task completion
- Evaluates robustness to failures
Evaluation Harnesses Used:
# SWE-Bench evaluation
harness = "SWE-agent" # Standard SWE-Bench harness
# Terminal-Bench v2 evaluation
harness = "ReAct + single terminal tool" # Simple reasoning loop
# Terminal-Bench Hard evaluation
harness = "Terminus-2" # Advanced multi-tool harness
Speed and Efficiency Advantages
Cohere conducted internal performance testing comparing North Mini Code to Devstral Small 2 (Mistral's small coding model) under identical hardware and concurrency conditions.
Output Throughput: 2.8x Advantage
Test Configuration:
- Hardware: Identical GPU setup (1× H100)
- Concurrency: High and low concurrency levels tested
- Workload: Real-world coding prompts
- Precision: FP8 for both models
Results:
| Metric | North Mini Code | Devstral Small 2 | Advantage |
|---|---|---|---|
| Output Throughput (High Concurrency) | 2.8x baseline | 1.0x baseline | +180% |
| Output Throughput (Low Concurrency) | 2.5x baseline | 1.0x baseline | +150% |
| Inter-Token Latency | 30% lower | Baseline | +30% |
| Time-to-First-Token (TTFT) | Slightly slower | Baseline | -5% |
Practical Implications:
Example: Generating a 1,000-token code file
Devstral Small 2: ~10 seconds
North Mini Code: ~3.5 seconds (2.8x faster)
In a development session generating 10 files:
Devstral Small 2: 100 seconds
North Mini Code: 35 seconds (saves 65 seconds)
Inter-Token Latency Improvements
Inter-Token Latency measures the consistency and pacing of token generation—critical for smooth streaming and user experience.
30% Improvement:
- Smoother Streaming: More consistent token delivery
- Better UX: Reduces perceived "stuttering" during generation
- Predictable Performance: More reliable latency characteristics
- Higher Throughput: Less time waiting between tokens accumulates
Time-to-First-Token Trade-off
TTFT (Time-to-First-Token):
- North Mini Code: Slightly slower than Devstral Small 2 (~5% difference)
- Devstral Small 2: Maintains edge in prompt processing speed
Why the Trade-off Exists:
MoE Architecture Impact:
- Routing overhead: Selecting which experts to activate
- Sparse activation: Coordinating distributed experts
- Offset by: Massive throughput gains during generation
Net Result: Slightly slower to start, much faster to complete
When It Matters:
- ⚠️ Short Completions: TTFT dominates total time
- ✅ Long Completions: Throughput gains overwhelm TTFT delay
- ✅ Batch Processing: Amortized startup cost negligible
Agentic Coding Capabilities
North Mini Code is specifically optimized for agentic software engineering workflows—multi-step, autonomous coding tasks that go beyond simple code completion.
Sub-Agent Orchestration
Capability: Understand and coordinate multiple specialized sub-agents
Example Workflow:
# North Mini Code orchestrating sub-agents
main_agent_prompt = """
Task: Implement user authentication system
Sub-agents to coordinate:
1. Database Schema Agent: Design user tables and indexes
2. API Endpoint Agent: Create REST endpoints for auth
3. Frontend Form Agent: Build login/signup components
4. Security Review Agent: Audit for common vulnerabilities
5. Test Generation Agent: Write integration tests
Orchestrate these agents to build a complete auth system.
"""
# North Mini Code response demonstrates:
# - Planning coordination between agents
# - Passing outputs from one agent to next
# - Validating intermediate results
# - Error recovery when sub-agent fails
# - Final integration of all components
Why This Matters:
- Modern agentic coding involves teams of specialized agents
- Main agent must understand dependencies and sequencing
- Requires reasoning about agent capabilities and outputs
- Critical for scaling to complex software projects
Systems Architecture Mapping
Capability: Analyze existing systems and map their architecture
Example Use Case:
prompt = """
Analyze this codebase and provide:
1. High-level architecture diagram (components and relationships)
2. Data flow between modules
3. External dependencies and integrations
4. Potential bottlenecks or anti-patterns
5. Recommendations for refactoring
[Codebase context with 50+ files]
"""
# North Mini Code can:
# - Parse relationships across multiple files
# - Identify architectural patterns (MVC, microservices, etc.)
# - Trace data flow through the system
# - Detect design issues (circular dependencies, tight coupling)
# - Suggest improvements aligned with best practices
Applications:
- Legacy Code Understanding: Onboarding to unfamiliar codebases
- Refactoring Planning: Identifying modules to redesign
- Microservices Migration: Mapping monoliths to service boundaries
- Documentation: Auto-generating architecture docs
Code Review Automation
Capability: Perform comprehensive code reviews like a senior engineer
Review Dimensions:
code_review_prompt = """
Review this pull request for:
1. Code Quality
- Readability and maintainability
- Adherence to style guides
- Proper naming conventions
2. Correctness
- Logic errors and edge cases
- Proper error handling
- Input validation
3. Performance
- Algorithmic complexity
- Resource usage (memory, I/O)
- Database query optimization
4. Security
- SQL injection, XSS vulnerabilities
- Authentication and authorization
- Sensitive data handling
5. Testing
- Test coverage adequacy
- Missing test scenarios
- Test quality and clarity
[Pull request diff]
"""
North Mini Code Output:
- Specific line-by-line comments
- Suggested improvements with code examples
- Security vulnerability identification
- Performance optimization recommendations
- Test case suggestions
Deployment Options and Availability
Multiple Access Channels
1. Hugging Face (Weights Download)
Free access to model weights for self-deployment:
# Download via Hugging Face CLI
huggingface-cli download cohere/north-mini-code-1.0 \
--local-dir ./north-mini-code \
--local-dir-use-symlinks False
# Load with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"cohere/north-mini-code-1.0",
device_map="auto",
torch_dtype="float8"
)
tokenizer = AutoTokenizer.from_pretrained("cohere/north-mini-code-1.0")
# Generate code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
2. Cohere Model Vault
Fully managed inference platform:
import cohere
co = cohere.Client(api_key="your-cohere-api-key")
response = co.generate(
model="north-mini-code-1.0",
prompt="Create a REST API endpoint for user authentication",
max_tokens=2048,
temperature=0.2
)
print(response.generations[0].text)
Benefits:
- Zero infrastructure management
- Automatic scaling
- Built-in monitoring and logging
- Optimized inference performance
- Pay-per-use pricing
3. Cohere API
Direct API access for integration:
curl https://api.cohere.ai/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "north-mini-code-1.0",
"prompt": "Implement binary search in Python",
"max_tokens": 1024,
"temperature": 0.1
}'
4. OpenRouter
Multi-provider access platform:
import openai
client = openai.OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
completion = client.chat.completions.create(
model="cohere/north-mini-code-1.0",
messages=[
{"role": "system", "content": "You are an expert programmer."},
{"role": "user", "content": "Write a SQL query to find duplicate emails"}
]
)
print(completion.choices[0].message.content)
OpenCode Compatibility
North Mini Code is specifically optimized for the OpenCode agent harness:
What is OpenCode?
- Open-source agentic coding framework
- Similar architecture to Claude Code, Codex CLI
- Supports terminal access, file operations, web search
- Extensible via plugins and custom tools
Integration Example:
# Install OpenCode
pip install opencode-agents
# Configure North Mini Code backend
opencode config set model cohere/north-mini-code-1.0
opencode config set api_key YOUR_COHERE_KEY
# Run agentic coding session
opencode run "Build a FastAPI app with SQLite database"
# North Mini Code will:
# 1. Plan the project structure
# 2. Create necessary files
# 3. Write FastAPI endpoints
# 4. Set up SQLite connection
# 5. Generate tests
# 6. Validate everything works
Why OpenCode Matters:
- Provides agent harness for real-world tasks
- Enables agentic workflows North Mini Code was trained for
- Open-source alternative to proprietary coding agents
- Community-driven development and extensions
Use Cases and Applications
1. Rapid Prototyping
Scenario: Startup needs to validate product idea quickly
task = """
Build a minimal viable product for a task management app:
- REST API (FastAPI)
- SQLite database with tasks table
- CRUD operations (create, read, update, delete tasks)
- Basic authentication
- React frontend with task list and add/edit forms
- Deploy script for Vercel (backend) and Netlify (frontend)
"""
# North Mini Code generates:
# - Complete backend with 5 endpoints
# - Database schema and migrations
# - Frontend with 3 components
# - Integration tests
# - Deployment configurations
# Total: ~2,000 lines of production-ready code in minutes
Time Savings:
- Traditional development: 2-3 days
- With North Mini Code: 1-2 hours (review + testing)
- Savings: 90%+ reduction in initial development time
2. Legacy Codebase Modernization
Scenario: Enterprise migrating from Python 2.7 to Python 3.12
task = """
Analyze this Python 2.7 codebase and:
1. Identify compatibility issues with Python 3.12
2. Generate migration plan with risk assessment
3. Refactor critical modules first (auth, database)
4. Update tests to pass with Python 3.12
5. Document breaking changes and migration steps
"""
# North Mini Code:
# - Scans 50,000+ lines of legacy code
# - Identifies 200+ compatibility issues
# - Prioritizes 15 critical modules
# - Generates refactored code with Python 3.12 features
# - Creates comprehensive migration documentation
Benefits:
- Handles tedious but critical migration work
- Identifies hidden dependencies and issues
- Suggests modern Python patterns and features
- Reduces manual migration errors
3. Security Audit Automation
Scenario: Security team needs to audit microservices for vulnerabilities
task = """
Audit these 10 microservices for security issues:
- SQL injection vulnerabilities
- XSS attack vectors
- Authentication bypass possibilities
- Authorization logic flaws
- Sensitive data exposure
- CSRF vulnerabilities
- Dependency vulnerabilities (outdated packages)
Provide:
- Severity ratings (Critical/High/Medium/Low)
- Exploit scenarios
- Remediation code examples
- Priority order for fixes
"""
# North Mini Code generates:
# - Comprehensive security report
# - 23 identified vulnerabilities across services
# - PoC exploit code for critical issues
# - Fixed code samples for each vulnerability
# - Dependency upgrade recommendations
Impact:
- Automates first-pass security review
- Identifies issues human reviewers might miss
- Provides actionable remediation guidance
- Scales security reviews across large codebases
4. Test Suite Generation
Scenario: Open-source project lacks comprehensive tests
task = """
Generate comprehensive test suite for this library:
- Unit tests for all public functions
- Integration tests for workflows
- Edge case coverage (null inputs, boundary conditions)
- Performance regression tests
- Mocking for external dependencies
- Test fixtures and helpers
- Achieve >90% code coverage
"""
# North Mini Code creates:
# - 150+ test cases covering all modules
# - Pytest fixtures for common test scenarios
# - Mock implementations for external APIs
# - Property-based tests using Hypothesis
# - Performance benchmarks
# - CI/CD integration (GitHub Actions)
Value:
- Improves code quality through comprehensive testing
- Catches regressions before deployment
- Documents expected behavior through tests
- Reduces manual test writing time by 80%+
Sovereign AI and On-Premises Deployment
Why Sovereign AI Matters
Definition: Sovereign AI means owning and controlling your AI infrastructure—models, data, and deployment—without dependencies on external providers.
Key Principles:
- Data Sovereignty: Training data and inference queries stay on-premises
- Model Control: Full access to model weights, no black boxes
- Deployment Flexibility: Run anywhere (on-prem, private cloud, edge)
- No Vendor Lock-in: Switch providers or self-host without migration pain
North Mini Code's Sovereign Advantages
1. Apache 2.0 License
Permissions:
✅ Commercial use
✅ Modification and derivatives
✅ Distribution
✅ Private use
✅ Patent use
Conditions:
- Include original license and copyright notice
- State changes if you modify the code
Limitations:
❌ No liability
❌ No warranty
❌ Trademark use restrictions
What This Means:
- Deploy commercially without licensing fees
- Modify model for domain-specific needs
- Create proprietary derivatives (closed-source OK)
- No usage restrictions or rate limits
2. Self-Hosting on Modest Hardware
Minimum Requirements:
GPU: 1× NVIDIA H100 (80GB VRAM)
Precision: FP8 (8-bit floating point)
Memory: ~15GB VRAM for model weights + 10GB for KV cache
Disk: ~30GB for model storage
Cost:
Cloud (AWS p5.2xlarge): ~$10-15/hour
On-prem H100 server: ~$35,000 one-time (amortized over years)
Cost Comparison (1 year, heavy usage):
| Option | Setup Cost | Running Cost (1 year) | Total |
|---|---|---|---|
| OpenAI API | $0 | $50,000+ | $50,000+ |
| Cohere API | $0 | $30,000+ | $30,000+ |
| Cloud H100 | $0 | $87,600 (24/7) | $87,600 |
| On-Prem H100 | $35,000 | $5,000 (power + maintenance) | $40,000 |
Assumes heavy inference workload (1M tokens/day output)
3. Air-Gapped Deployment
For environments requiring complete network isolation:
# Offline deployment workflow
# 1. Download model on internet-connected machine
huggingface-cli download cohere/north-mini-code-1.0 \
--local-dir ./offline-model
# 2. Transfer to air-gapped environment (USB, secure transfer)
tar -czf north-mini-code.tar.gz ./offline-model
# ... physical transfer ...
# 3. Extract and deploy on air-gapped server
tar -xzf north-mini-code.tar.gz
python deploy_offline.py --model-path ./offline-model
# 4. Inference runs completely offline
# - No internet connectivity required
# - No telemetry or usage tracking
# - Complete data isolation
Use Cases:
- Government and defense contractors
- Healthcare (HIPAA compliance)
- Financial institutions (regulatory requirements)
- Trade secret protection (IP-sensitive companies)
Comparison: Sovereign vs. API-Based Models
| Feature | North Mini Code (Sovereign) | GPT-5.5 / Claude Fable (API) |
|---|---|---|
| Data Privacy | Complete on-prem control | Data sent to provider |
| Customization | Full model access | Prompt-only customization |
| Cost Model | One-time hardware + power | Per-token pricing |
| Latency | Local (ms) | Network-dependent (100ms+) |
| Availability | Always (local) | Provider uptime-dependent |
| Compliance | Easier (data stays local) | Complex (third-party processors) |
| Rate Limits | None (your hardware) | Provider-imposed limits |
| Model Updates | Your choice when to update | Provider-controlled updates |
Limitations and Considerations
1. Capability Gap vs. Frontier Models
While North Mini Code is competitive among similarly-sized models, it lags behind large frontier models:
| Model | Size | SWE-Bench Verified (est.) | Note |
|---|---|---|---|
| GPT-5.5 | Proprietary | ~70%+ | State-of-the-art |
| Claude Fable 5 | Proprietary | ~65%+ | Excellent reasoning |
| DeepSeek V4 Pro | 236B MoE (21B active) | ~55-60% | Large open model |
| North Mini Code | 30B MoE (3B active) | ~25-35% (est.) | Small, efficient |
Trade-off:
- ✅ Efficiency and cost vs. ⚠️ Lower absolute performance
- ✅ Self-hosting flexibility vs. ⚠️ Capability limitations
When to Choose North Mini Code:
- Task complexity is moderate (not cutting-edge research)
- Data sovereignty is critical
- Cost optimization is priority
- Local deployment is required
When to Choose Frontier Models:
- Maximum capability needed
- Complex reasoning required
- Cost is secondary to performance
- Cloud deployment acceptable
2. MoE Router Overhead
Mixture-of-Experts Trade-offs:
Benefits:
- ✅ Only 3B active parameters (10% of total)
- ✅ Faster inference than 30B dense model
- ✅ Specialist experts for different patterns
Drawbacks:
- ⚠️ Routing overhead (selecting experts)
- ⚠️ TTFT slightly slower than dense models
- ⚠️ Load imbalance if routing is skewed
- ⚠️ Memory still needed for full 30B model
Mitigation:
- Optimize for long completions where throughput dominates
- Use batching to amortize routing overhead
- Monitor expert utilization for balanced routing
3. OpenCode Ecosystem Maturity
North Mini Code is optimized for OpenCode, but the OpenCode ecosystem is still maturing:
Current State:
- ✅ Core functionality stable
- ✅ Compatible with major coding agents
- ⚠️ Limited plugin ecosystem vs. Claude Code/Codex
- ⚠️ Smaller community than proprietary alternatives
- ⚠️ Documentation still evolving
Workarounds:
- North Mini Code also works with other harnesses (LangChain, CrewAI)
- API access doesn't require OpenCode
- Community is actively developing new tools
4. Benchmark Transparency
Cohere reports "competitive scores" without disclosing exact percentages:
What We Know:
- ✅ Scores 33.4 on Artificial Analysis Coding Index
- ✅ Internal comparisons show throughput advantages
- ⚠️ Exact SWE-Bench / Terminal-Bench scores not published
- ⚠️ Competitor comparisons limited
Why Transparency Matters:
- Hard to evaluate true performance without exact numbers
- Difficult to compare directly against other models
- "Competitive" is subjective and vague
Community Action:
- Independent benchmarking underway
- Expect third-party evaluations soon
- Early reports suggest mid-20s to mid-30s % on SWE-Bench Verified
Comparison to Competing Models
North Mini Code vs. Other Small Open Models
| Model | Size | License | Context | SWE-Bench (est.) | Availability |
|---|---|---|---|---|---|
| North Mini Code | 30B MoE (3B active) | Apache 2.0 | 256K | ~30% | Hugging Face, API |
| Devstral Small 2 | 22B | Apache 2.0 | 128K | ~28% | Hugging Face |
| DeepSeek Coder 7B | 7B | MIT | 128K | ~22% | Hugging Face |
| Qwen2.5 Coder 32B | 32B | Apache 2.0 | 128K | ~35% | Hugging Face |
| Gemma 4 E4B | 27B | Gemma License | 128K | ~25% | Hugging Face |
North Mini Code Advantages:
- ✅ Larger context window (256K vs. 128K)
- ✅ Higher throughput (2.8x vs. Devstral)
- ✅ Optimized for agentic workflows
- ✅ Strong inter-token latency
Competitor Advantages:
- ⚠️ Qwen2.5 Coder: Slightly higher absolute scores
- ⚠️ DeepSeek Coder: Smaller, easier to run
- ⚠️ Devstral: Better TTFT performance
North Mini Code vs. Frontier Coding Models
| Feature | North Mini Code | Claude Fable 5 | GPT-5.5 Codex |
|---|---|---|---|
| Deployment | Self-hosted | API only | API only |
| Cost (1M output tokens) | ~$5-10 (self-hosted) | $50 | $60 |
| Data Privacy | Complete | Third-party | Third-party |
| Customization | Full model access | Prompt-level | Prompt-level |
| Performance | Moderate | Excellent | Excellent |
| Context | 256K | 200K | 128K |
When North Mini Code Wins:
- Sovereign deployment requirement
- Cost optimization critical
- Data can't leave premises
- Need customization beyond prompts
When Frontier Models Win:
- Maximum capability needed
- Cost is secondary concern
- Cloud deployment acceptable
- Cutting-edge features required
Getting Started with North Mini Code
Quick Start: API Access
1. Get Cohere API Key
# Sign up at https://cohere.ai
# Navigate to API Keys section
# Create new key for North Mini Code
2. Install SDK
pip install cohere
3. Generate Code
import cohere
co = cohere.Client("YOUR_API_KEY")
response = co.generate(
model="north-mini-code-1.0",
prompt="Write a Python function to calculate Fibonacci numbers",
max_tokens=512,
temperature=0.2,
stop_sequences=["```"]
)
print(response.generations[0].text)
Self-Hosting Guide
Prerequisites:
- NVIDIA GPU with 80GB+ VRAM (H100, A100 80GB)
- CUDA 12.0+
- Python 3.10+
Step 1: Download Model
# Install Hugging Face CLI
pip install huggingface_hub
# Download weights
huggingface-cli download cohere/north-mini-code-1.0 \
--local-dir ./north-mini-code \
--include "*.safetensors" "*.json"
Step 2: Install Inference Engine
# Option A: vLLM (recommended for throughput)
pip install vllm
# Option B: TGI (Text Generation Inference)
docker pull ghcr.io/huggingface/text-generation-inference:latest
# Option C: Transformers (simple but slower)
pip install transformers accelerate
Step 3: Launch Server
# Using vLLM
python -m vllm.entrypoints.openai.api_server \
--model ./north-mini-code \
--dtype float8 \
--gpu-memory-utilization 0.9 \
--max-model-len 256000 \
--port 8000
# Server runs at http://localhost:8000
Step 4: Query Server
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "north-mini-code",
"prompt": "def quick_sort(arr):",
"max_tokens": 512,
"temperature": 0.1
}'
OpenCode Integration
Install OpenCode:
pip install opencode-agents
Configure Backend:
# For self-hosted deployment
opencode config set backend http://localhost:8000
opencode config set model north-mini-code
# For Cohere API
opencode config set backend cohere
opencode config set api_key YOUR_COHERE_KEY
opencode config set model north-mini-code-1.0
Run Agentic Task:
# Interactive session
opencode chat
# One-shot task
opencode run "Build a Flask app with user authentication"
# With specific files
opencode run --files src/*.py "Refactor these modules to use async/await"
Community and Ecosystem
Open Development Philosophy
Cohere is building North Mini Code in the open, with community feedback shaping the roadmap:
Community Channels:
- 🐦 Twitter/X: @CohereAI — Tag @Cohere to share builds
- 💬 Discord: Join official Cohere Discord server
- 🗨️ Reddit: r/CohereAI for discussions
- 🐙 GitHub: Report issues, contribute to ecosystem tools
Feedback Priorities:
- Benchmark performance gaps
- Real-world use case pain points
- Feature requests for future releases
- Integration compatibility issues
Roadmap and Future Models
What's Next:
- North Mini Code is the first, not the last
- Larger models coming: More powerful variants in development
- Specialized models: Domain-specific coding models (e.g., ML, frontend, systems)
- Fine-tuning support: Official fine-tuning guides and tooling
- Quantization options: 4-bit, 2-bit variants for edge deployment
Community-Driven Priorities:
- Most-requested benchmarks will be prioritized
- Integration gaps will be addressed
- Documentation expanded based on common questions
Contributing to the Ecosystem
How to Help:
- Benchmark and Report: Run North Mini Code on your tasks, share results
- Build Tools: Create OpenCode plugins, integrations, workflows
- Write Guides: Tutorial content for common use cases
- Identify Issues: Report bugs, limitations, compatibility problems
- Showcase Projects: Share what you build to inspire others
Recognition:
- Featured projects highlighted on Cohere blog
- Community contributors credited in updates
- Ecosystem tools promoted through official channels
Sources and References
Official Resources
Announcement:
- Cohere Blog: North Mini Code Launch
- Published: June 9, 2026
Technical Documentation:
Benchmarks:
- Artificial Analysis Coding Index: 33.4
- Internal testing vs. Devstral Small 2 (throughput, latency)
- Evaluations on SWE-Bench, Terminal-Bench (harness-specific)
Related Reading
- Cohere Command A+: Open-Source Apache 2.0 Model
- Terminal-Bench 2.0: The AI Agent Benchmark That Actually Matters
- Self-Harness: AI Agents That Improve Their Own Framework
- Agent Harness Engineering: When the Model Stays Fixed
- Loop Engineering: Coding Agents and Claude Code Guide
- DeepSWE Benchmark: GPT-5.5 vs SWE-Bench Pro
North Mini Code was launched by Cohere on June 9, 2026 as their first open-source agentic coding model under Apache 2.0 license. A 30B parameter mixture-of-experts model with 3B active parameters, it delivers competitive performance on SWE-Bench and Terminal-Bench while offering 2.8x throughput advantages over similar-sized models, positioning it as a sovereign AI solution for developers requiring on-premises deployment and complete control over their AI infrastructure.