← Blog
explainx / blog

Cohere North Mini Code: Open-Source Agentic Coding Model (Apache 2.0)

Cohere launches North Mini Code, a 30B parameter MoE model (3B active) built for agentic software engineering, achieving 2.8x throughput vs Devstral Small 2 under Apache 2.0 license.

15 min readYash Thakker
CohereOpen SourceAI CodingAgent DevelopmentMoE Models

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

Cohere North Mini Code: Open-Source Agentic Coding Model (Apache 2.0)

TL;DR: On June 9, 2026, Cohere launched North Mini Code—their first open-source agentic coding model under Apache 2.0 license. A 30B parameter mixture-of-experts (MoE) architecture with just 3B active parameters, North Mini Code delivers competitive performance on SWE-Bench Verified, SWE-Bench Pro, and Terminal-Bench 2.0 benchmarks while achieving up to 2.8x higher output throughput than Devstral Small 2. Built for sovereign AI deployment, it supports 256K context, requires just 1× H100 GPU at FP8, and is optimized for agentic software engineering workflows including sub-agent orchestration, systems architecture mapping, and code reviews.


Cohere's Entry into Open-Source Developer Models

After establishing itself with enterprise-focused models like Command A+, Cohere is expanding into the developer ecosystem with North Mini Code—the first model in their next generation of powerful, open-source AI systems.

The Sovereign AI Positioning

Core Mission: Enable developers to deploy agentic coding capabilities on their own terms—on-premises, locally, or in private clouds—without vendor lock-in or usage restrictions.

Market Context:

  • Closed Ecosystem: GPT-5.5, Claude Fable 5, Gemini 3.5 require API access
  • Open Alternatives: DeepSeek V4 Pro, Qwen, Llama 4 offer self-hosting
  • Gap: Limited small, efficient open models optimized for agentic coding

North Mini Code's Position:

graph TD
    A[Developer Needs] --> B{Deployment Preference?}
    B -->|Cloud API| C[GPT-5.5, Claude Fable 5]
    B -->|Self-Hosted| D[North Mini Code, Qwen, DeepSeek]
    D --> E{Size Constraint?}
    E -->|Large OK| F[DeepSeek V4 Pro 236B]
    E -->|Small Preferred| G[North Mini Code 30B MoE]
    G --> H[3B Active - Efficient]

Technical Specifications

Model Architecture

North Mini Code at a Glance:

SpecificationDetails
Model TypeMixture-of-Experts (MoE)
Total Parameters30B
Active Parameters3B (per forward pass)
LicenseApache 2.0
Context Length256K tokens total
Max Generation64K tokens
PrecisionFP8 (recommended)
Hardware Minimum1× H100 GPU
Training FocusCode generation, agentic workflows, terminal tasks

Mixture-of-Experts Efficiency

How MoE Works:

# Conceptual MoE forward pass
def moe_forward(input_tokens):
    # Route each token to specialized expert
    routing_weights = router(input_tokens)  # Learned routing

    # Only activate top-k experts per token
    active_experts = select_top_k(routing_weights, k=2)

    # Process with sparse activation
    output = sum([
        expert(tokens) * weight
        for expert, weight in active_experts
    ])

    return output  # Only ~3B params active, not 30B

Benefits:

  • Efficiency: Only 10% of parameters active per forward pass
  • Speed: Faster inference than 30B dense model
  • Specialization: Different experts for different coding patterns
  • Cost: Reduced memory and compute requirements
Live Bootcamp6 weeks

Complete AI Builder Bootcamp

Claude, Python automation & full-stack — 12 live sessions with Yash Thakker.

View bootcamp

The Complete AI Builder Bootcamp is the best AI development course for learning Claude AI, prompt engineering, Python automation, and full-stack web development. This intensive 6-week live bootcamp teaches you how to build AI-powered applications using Claude Projects, Claude Artifacts, Claude Code, and the complete Claude ecosystem. You'll master prompt engineering techniques, learn to create custom Claude connectors and MCP integrations, build Python automation workflows, develop full-stack websites with AI assistance, and create AI marketing agents.

The bootcamp includes 12 live Zoom sessions with Yash Thakker, founder of AISOLO Technologies and instructor to 350,000+ students. You'll build 8+ portfolio projects including AI playbooks, full-stack note-taking applications, Python automation scripts, marketing agents, and personal portfolio websites. The curriculum covers AI fundamentals, Claude Projects and Artifacts, Claude Co-work, Claude plugins and skills, Claude Code for Python development, full-stack development, AI marketing, and capstone projects.

Students receive 1-year access to all recordings, permanent Discord community access, a certificate of completion, and personalized career guidance. All enrollments include a 7-day money-back guarantee. This is the most comprehensive Claude AI bootcamp available, taking students from zero AI knowledge to expert AI builder in 6 weeks.

Context Window and Generation

256K Context Window:

  • Entire Codebases: Fit large projects in context
  • Multi-File Reasoning: Understand dependencies across files
  • Long Debugging Sessions: Maintain context through extended interactions
  • Documentation: Include extensive API references and docs

64K Maximum Generation:

  • Complete Modules: Generate full feature implementations
  • Comprehensive Refactors: Rewrite large sections of code
  • Detailed Explanations: Provide thorough code walkthroughs
  • Test Suites: Generate extensive test coverage

Comparison:

ModelContext WindowMax Generation
North Mini Code256K64K
GPT-4o128K16K
Claude 3.5 Sonnet200K8K
Devstral Small 2128K32K
DeepSeek V4 Coder128K8K

Performance Benchmarks

Agentic Software Engineering Scores

Cohere evaluated North Mini Code on industry-standard benchmarks for agentic coding capabilities:

SWE-Bench Performance:

BenchmarkNorth Mini CodeCompetitor Range*
SWE-Bench VerifiedCompetitive**15-35% (similar size)
SWE-Bench ProCompetitive**10-25% (similar size)
Terminal-Bench v2Competitive**35-55% (similar size)
Terminal-Bench HardCompetitive**20-40% (similar size)

*Competitor range includes Gemma 4 E4B, DeepSeek Coder 7B, Qwen2.5 Coder 32B, and similar-sized models **Cohere reports "competitive scores" without disclosing exact percentages

Artificial Analysis Coding Index:

  • Score: 33.4
  • Interpretation: Aggregated performance across coding benchmarks
  • Positioning: Competitive among similarly sized open-source models

What These Benchmarks Measure

SWE-Bench Verified:

  • Real GitHub issues from popular repositories
  • Requires generating patches that pass existing test suites
  • Tests understanding of complex codebases
  • Measures production-ready code generation

Terminal-Bench 2.0:

  • 89 carefully curated tasks across diverse domains
  • Multi-step terminal workflows
  • System administration, ML, security, biology tasks
  • Tests agentic planning and tool use

Terminal-Bench Hard:

  • More challenging subset of Terminal-Bench tasks
  • Requires advanced reasoning and error recovery
  • Tests long-horizon task completion
  • Evaluates robustness to failures

Evaluation Harnesses Used:

# SWE-Bench evaluation
harness = "SWE-agent"  # Standard SWE-Bench harness

# Terminal-Bench v2 evaluation
harness = "ReAct + single terminal tool"  # Simple reasoning loop

# Terminal-Bench Hard evaluation
harness = "Terminus-2"  # Advanced multi-tool harness

Speed and Efficiency Advantages

Cohere conducted internal performance testing comparing North Mini Code to Devstral Small 2 (Mistral's small coding model) under identical hardware and concurrency conditions.

Output Throughput: 2.8x Advantage

Test Configuration:

  • Hardware: Identical GPU setup (1× H100)
  • Concurrency: High and low concurrency levels tested
  • Workload: Real-world coding prompts
  • Precision: FP8 for both models

Results:

MetricNorth Mini CodeDevstral Small 2Advantage
Output Throughput (High Concurrency)2.8x baseline1.0x baseline+180%
Output Throughput (Low Concurrency)2.5x baseline1.0x baseline+150%
Inter-Token Latency30% lowerBaseline+30%
Time-to-First-Token (TTFT)Slightly slowerBaseline-5%

Practical Implications:

Example: Generating a 1,000-token code file

Devstral Small 2:  ~10 seconds
North Mini Code:   ~3.5 seconds  (2.8x faster)

In a development session generating 10 files:
Devstral Small 2:  100 seconds
North Mini Code:   35 seconds   (saves 65 seconds)

Inter-Token Latency Improvements

Inter-Token Latency measures the consistency and pacing of token generation—critical for smooth streaming and user experience.

30% Improvement:

  • Smoother Streaming: More consistent token delivery
  • Better UX: Reduces perceived "stuttering" during generation
  • Predictable Performance: More reliable latency characteristics
  • Higher Throughput: Less time waiting between tokens accumulates

Time-to-First-Token Trade-off

TTFT (Time-to-First-Token):

  • North Mini Code: Slightly slower than Devstral Small 2 (~5% difference)
  • Devstral Small 2: Maintains edge in prompt processing speed

Why the Trade-off Exists:

MoE Architecture Impact:
- Routing overhead: Selecting which experts to activate
- Sparse activation: Coordinating distributed experts
- Offset by: Massive throughput gains during generation

Net Result: Slightly slower to start, much faster to complete

When It Matters:

  • ⚠️ Short Completions: TTFT dominates total time
  • Long Completions: Throughput gains overwhelm TTFT delay
  • Batch Processing: Amortized startup cost negligible

Agentic Coding Capabilities

North Mini Code is specifically optimized for agentic software engineering workflows—multi-step, autonomous coding tasks that go beyond simple code completion.

Sub-Agent Orchestration

Capability: Understand and coordinate multiple specialized sub-agents

Example Workflow:

# North Mini Code orchestrating sub-agents
main_agent_prompt = """
Task: Implement user authentication system

Sub-agents to coordinate:
1. Database Schema Agent: Design user tables and indexes
2. API Endpoint Agent: Create REST endpoints for auth
3. Frontend Form Agent: Build login/signup components
4. Security Review Agent: Audit for common vulnerabilities
5. Test Generation Agent: Write integration tests

Orchestrate these agents to build a complete auth system.
"""

# North Mini Code response demonstrates:
# - Planning coordination between agents
# - Passing outputs from one agent to next
# - Validating intermediate results
# - Error recovery when sub-agent fails
# - Final integration of all components

Why This Matters:

  • Modern agentic coding involves teams of specialized agents
  • Main agent must understand dependencies and sequencing
  • Requires reasoning about agent capabilities and outputs
  • Critical for scaling to complex software projects

Systems Architecture Mapping

Capability: Analyze existing systems and map their architecture

Example Use Case:

prompt = """
Analyze this codebase and provide:
1. High-level architecture diagram (components and relationships)
2. Data flow between modules
3. External dependencies and integrations
4. Potential bottlenecks or anti-patterns
5. Recommendations for refactoring

[Codebase context with 50+ files]
"""

# North Mini Code can:
# - Parse relationships across multiple files
# - Identify architectural patterns (MVC, microservices, etc.)
# - Trace data flow through the system
# - Detect design issues (circular dependencies, tight coupling)
# - Suggest improvements aligned with best practices

Applications:

  • Legacy Code Understanding: Onboarding to unfamiliar codebases
  • Refactoring Planning: Identifying modules to redesign
  • Microservices Migration: Mapping monoliths to service boundaries
  • Documentation: Auto-generating architecture docs

Code Review Automation

Capability: Perform comprehensive code reviews like a senior engineer

Review Dimensions:

code_review_prompt = """
Review this pull request for:

1. Code Quality
   - Readability and maintainability
   - Adherence to style guides
   - Proper naming conventions

2. Correctness
   - Logic errors and edge cases
   - Proper error handling
   - Input validation

3. Performance
   - Algorithmic complexity
   - Resource usage (memory, I/O)
   - Database query optimization

4. Security
   - SQL injection, XSS vulnerabilities
   - Authentication and authorization
   - Sensitive data handling

5. Testing
   - Test coverage adequacy
   - Missing test scenarios
   - Test quality and clarity

[Pull request diff]
"""

North Mini Code Output:

  • Specific line-by-line comments
  • Suggested improvements with code examples
  • Security vulnerability identification
  • Performance optimization recommendations
  • Test case suggestions

Deployment Options and Availability

Multiple Access Channels

1. Hugging Face (Weights Download)

Free access to model weights for self-deployment:

# Download via Hugging Face CLI
huggingface-cli download cohere/north-mini-code-1.0 \
  --local-dir ./north-mini-code \
  --local-dir-use-symlinks False

# Load with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "cohere/north-mini-code-1.0",
    device_map="auto",
    torch_dtype="float8"
)

tokenizer = AutoTokenizer.from_pretrained("cohere/north-mini-code-1.0")

# Generate code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

2. Cohere Model Vault

Fully managed inference platform:

import cohere

co = cohere.Client(api_key="your-cohere-api-key")

response = co.generate(
    model="north-mini-code-1.0",
    prompt="Create a REST API endpoint for user authentication",
    max_tokens=2048,
    temperature=0.2
)

print(response.generations[0].text)

Benefits:

  • Zero infrastructure management
  • Automatic scaling
  • Built-in monitoring and logging
  • Optimized inference performance
  • Pay-per-use pricing

3. Cohere API

Direct API access for integration:

curl https://api.cohere.ai/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "north-mini-code-1.0",
    "prompt": "Implement binary search in Python",
    "max_tokens": 1024,
    "temperature": 0.1
  }'

4. OpenRouter

Multi-provider access platform:

import openai

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key"
)

completion = client.chat.completions.create(
    model="cohere/north-mini-code-1.0",
    messages=[
        {"role": "system", "content": "You are an expert programmer."},
        {"role": "user", "content": "Write a SQL query to find duplicate emails"}
    ]
)

print(completion.choices[0].message.content)

OpenCode Compatibility

North Mini Code is specifically optimized for the OpenCode agent harness:

What is OpenCode?

  • Open-source agentic coding framework
  • Similar architecture to Claude Code, Codex CLI
  • Supports terminal access, file operations, web search
  • Extensible via plugins and custom tools

Integration Example:

# Install OpenCode
pip install opencode-agents

# Configure North Mini Code backend
opencode config set model cohere/north-mini-code-1.0
opencode config set api_key YOUR_COHERE_KEY

# Run agentic coding session
opencode run "Build a FastAPI app with SQLite database"

# North Mini Code will:
# 1. Plan the project structure
# 2. Create necessary files
# 3. Write FastAPI endpoints
# 4. Set up SQLite connection
# 5. Generate tests
# 6. Validate everything works

Why OpenCode Matters:

  • Provides agent harness for real-world tasks
  • Enables agentic workflows North Mini Code was trained for
  • Open-source alternative to proprietary coding agents
  • Community-driven development and extensions

Use Cases and Applications

1. Rapid Prototyping

Scenario: Startup needs to validate product idea quickly

task = """
Build a minimal viable product for a task management app:
- REST API (FastAPI)
- SQLite database with tasks table
- CRUD operations (create, read, update, delete tasks)
- Basic authentication
- React frontend with task list and add/edit forms
- Deploy script for Vercel (backend) and Netlify (frontend)
"""

# North Mini Code generates:
# - Complete backend with 5 endpoints
# - Database schema and migrations
# - Frontend with 3 components
# - Integration tests
# - Deployment configurations
# Total: ~2,000 lines of production-ready code in minutes

Time Savings:

  • Traditional development: 2-3 days
  • With North Mini Code: 1-2 hours (review + testing)
  • Savings: 90%+ reduction in initial development time

2. Legacy Codebase Modernization

Scenario: Enterprise migrating from Python 2.7 to Python 3.12

task = """
Analyze this Python 2.7 codebase and:
1. Identify compatibility issues with Python 3.12
2. Generate migration plan with risk assessment
3. Refactor critical modules first (auth, database)
4. Update tests to pass with Python 3.12
5. Document breaking changes and migration steps
"""

# North Mini Code:
# - Scans 50,000+ lines of legacy code
# - Identifies 200+ compatibility issues
# - Prioritizes 15 critical modules
# - Generates refactored code with Python 3.12 features
# - Creates comprehensive migration documentation

Benefits:

  • Handles tedious but critical migration work
  • Identifies hidden dependencies and issues
  • Suggests modern Python patterns and features
  • Reduces manual migration errors

3. Security Audit Automation

Scenario: Security team needs to audit microservices for vulnerabilities

task = """
Audit these 10 microservices for security issues:
- SQL injection vulnerabilities
- XSS attack vectors
- Authentication bypass possibilities
- Authorization logic flaws
- Sensitive data exposure
- CSRF vulnerabilities
- Dependency vulnerabilities (outdated packages)

Provide:
- Severity ratings (Critical/High/Medium/Low)
- Exploit scenarios
- Remediation code examples
- Priority order for fixes
"""

# North Mini Code generates:
# - Comprehensive security report
# - 23 identified vulnerabilities across services
# - PoC exploit code for critical issues
# - Fixed code samples for each vulnerability
# - Dependency upgrade recommendations

Impact:

  • Automates first-pass security review
  • Identifies issues human reviewers might miss
  • Provides actionable remediation guidance
  • Scales security reviews across large codebases

4. Test Suite Generation

Scenario: Open-source project lacks comprehensive tests

task = """
Generate comprehensive test suite for this library:
- Unit tests for all public functions
- Integration tests for workflows
- Edge case coverage (null inputs, boundary conditions)
- Performance regression tests
- Mocking for external dependencies
- Test fixtures and helpers
- Achieve >90% code coverage
"""

# North Mini Code creates:
# - 150+ test cases covering all modules
# - Pytest fixtures for common test scenarios
# - Mock implementations for external APIs
# - Property-based tests using Hypothesis
# - Performance benchmarks
# - CI/CD integration (GitHub Actions)

Value:

  • Improves code quality through comprehensive testing
  • Catches regressions before deployment
  • Documents expected behavior through tests
  • Reduces manual test writing time by 80%+

Sovereign AI and On-Premises Deployment

Why Sovereign AI Matters

Definition: Sovereign AI means owning and controlling your AI infrastructure—models, data, and deployment—without dependencies on external providers.

Key Principles:

  1. Data Sovereignty: Training data and inference queries stay on-premises
  2. Model Control: Full access to model weights, no black boxes
  3. Deployment Flexibility: Run anywhere (on-prem, private cloud, edge)
  4. No Vendor Lock-in: Switch providers or self-host without migration pain

North Mini Code's Sovereign Advantages

1. Apache 2.0 License

Permissions:
✅ Commercial use
✅ Modification and derivatives
✅ Distribution
✅ Private use
✅ Patent use

Conditions:
- Include original license and copyright notice
- State changes if you modify the code

Limitations:
❌ No liability
❌ No warranty
❌ Trademark use restrictions

What This Means:

  • Deploy commercially without licensing fees
  • Modify model for domain-specific needs
  • Create proprietary derivatives (closed-source OK)
  • No usage restrictions or rate limits

2. Self-Hosting on Modest Hardware

Minimum Requirements:

GPU: 1× NVIDIA H100 (80GB VRAM)
Precision: FP8 (8-bit floating point)
Memory: ~15GB VRAM for model weights + 10GB for KV cache
Disk: ~30GB for model storage

Cost:
Cloud (AWS p5.2xlarge): ~$10-15/hour
On-prem H100 server: ~$35,000 one-time (amortized over years)

Cost Comparison (1 year, heavy usage):

OptionSetup CostRunning Cost (1 year)Total
OpenAI API$0$50,000+$50,000+
Cohere API$0$30,000+$30,000+
Cloud H100$0$87,600 (24/7)$87,600
On-Prem H100$35,000$5,000 (power + maintenance)$40,000

Assumes heavy inference workload (1M tokens/day output)

3. Air-Gapped Deployment

For environments requiring complete network isolation:

# Offline deployment workflow
# 1. Download model on internet-connected machine
huggingface-cli download cohere/north-mini-code-1.0 \
  --local-dir ./offline-model

# 2. Transfer to air-gapped environment (USB, secure transfer)
tar -czf north-mini-code.tar.gz ./offline-model
# ... physical transfer ...

# 3. Extract and deploy on air-gapped server
tar -xzf north-mini-code.tar.gz
python deploy_offline.py --model-path ./offline-model

# 4. Inference runs completely offline
# - No internet connectivity required
# - No telemetry or usage tracking
# - Complete data isolation

Use Cases:

  • Government and defense contractors
  • Healthcare (HIPAA compliance)
  • Financial institutions (regulatory requirements)
  • Trade secret protection (IP-sensitive companies)

Comparison: Sovereign vs. API-Based Models

FeatureNorth Mini Code (Sovereign)GPT-5.5 / Claude Fable (API)
Data PrivacyComplete on-prem controlData sent to provider
CustomizationFull model accessPrompt-only customization
Cost ModelOne-time hardware + powerPer-token pricing
LatencyLocal (ms)Network-dependent (100ms+)
AvailabilityAlways (local)Provider uptime-dependent
ComplianceEasier (data stays local)Complex (third-party processors)
Rate LimitsNone (your hardware)Provider-imposed limits
Model UpdatesYour choice when to updateProvider-controlled updates

Limitations and Considerations

1. Capability Gap vs. Frontier Models

While North Mini Code is competitive among similarly-sized models, it lags behind large frontier models:

ModelSizeSWE-Bench Verified (est.)Note
GPT-5.5Proprietary~70%+State-of-the-art
Claude Fable 5Proprietary~65%+Excellent reasoning
DeepSeek V4 Pro236B MoE (21B active)~55-60%Large open model
North Mini Code30B MoE (3B active)~25-35% (est.)Small, efficient

Trade-off:

  • ✅ Efficiency and cost vs. ⚠️ Lower absolute performance
  • ✅ Self-hosting flexibility vs. ⚠️ Capability limitations

When to Choose North Mini Code:

  • Task complexity is moderate (not cutting-edge research)
  • Data sovereignty is critical
  • Cost optimization is priority
  • Local deployment is required

When to Choose Frontier Models:

  • Maximum capability needed
  • Complex reasoning required
  • Cost is secondary to performance
  • Cloud deployment acceptable

2. MoE Router Overhead

Mixture-of-Experts Trade-offs:

Benefits:

  • ✅ Only 3B active parameters (10% of total)
  • ✅ Faster inference than 30B dense model
  • ✅ Specialist experts for different patterns

Drawbacks:

  • ⚠️ Routing overhead (selecting experts)
  • ⚠️ TTFT slightly slower than dense models
  • ⚠️ Load imbalance if routing is skewed
  • ⚠️ Memory still needed for full 30B model

Mitigation:

  • Optimize for long completions where throughput dominates
  • Use batching to amortize routing overhead
  • Monitor expert utilization for balanced routing

3. OpenCode Ecosystem Maturity

North Mini Code is optimized for OpenCode, but the OpenCode ecosystem is still maturing:

Current State:

  • ✅ Core functionality stable
  • ✅ Compatible with major coding agents
  • ⚠️ Limited plugin ecosystem vs. Claude Code/Codex
  • ⚠️ Smaller community than proprietary alternatives
  • ⚠️ Documentation still evolving

Workarounds:

  • North Mini Code also works with other harnesses (LangChain, CrewAI)
  • API access doesn't require OpenCode
  • Community is actively developing new tools

4. Benchmark Transparency

Cohere reports "competitive scores" without disclosing exact percentages:

What We Know:

  • ✅ Scores 33.4 on Artificial Analysis Coding Index
  • ✅ Internal comparisons show throughput advantages
  • ⚠️ Exact SWE-Bench / Terminal-Bench scores not published
  • ⚠️ Competitor comparisons limited

Why Transparency Matters:

  • Hard to evaluate true performance without exact numbers
  • Difficult to compare directly against other models
  • "Competitive" is subjective and vague

Community Action:

  • Independent benchmarking underway
  • Expect third-party evaluations soon
  • Early reports suggest mid-20s to mid-30s % on SWE-Bench Verified

Comparison to Competing Models

North Mini Code vs. Other Small Open Models

ModelSizeLicenseContextSWE-Bench (est.)Availability
North Mini Code30B MoE (3B active)Apache 2.0256K~30%Hugging Face, API
Devstral Small 222BApache 2.0128K~28%Hugging Face
DeepSeek Coder 7B7BMIT128K~22%Hugging Face
Qwen2.5 Coder 32B32BApache 2.0128K~35%Hugging Face
Gemma 4 E4B27BGemma License128K~25%Hugging Face

North Mini Code Advantages:

  • ✅ Larger context window (256K vs. 128K)
  • ✅ Higher throughput (2.8x vs. Devstral)
  • ✅ Optimized for agentic workflows
  • ✅ Strong inter-token latency

Competitor Advantages:

  • ⚠️ Qwen2.5 Coder: Slightly higher absolute scores
  • ⚠️ DeepSeek Coder: Smaller, easier to run
  • ⚠️ Devstral: Better TTFT performance

North Mini Code vs. Frontier Coding Models

FeatureNorth Mini CodeClaude Fable 5GPT-5.5 Codex
DeploymentSelf-hostedAPI onlyAPI only
Cost (1M output tokens)~$5-10 (self-hosted)$50$60
Data PrivacyCompleteThird-partyThird-party
CustomizationFull model accessPrompt-levelPrompt-level
PerformanceModerateExcellentExcellent
Context256K200K128K

When North Mini Code Wins:

  • Sovereign deployment requirement
  • Cost optimization critical
  • Data can't leave premises
  • Need customization beyond prompts

When Frontier Models Win:

  • Maximum capability needed
  • Cost is secondary concern
  • Cloud deployment acceptable
  • Cutting-edge features required

Getting Started with North Mini Code

Quick Start: API Access

1. Get Cohere API Key

# Sign up at https://cohere.ai
# Navigate to API Keys section
# Create new key for North Mini Code

2. Install SDK

pip install cohere

3. Generate Code

import cohere

co = cohere.Client("YOUR_API_KEY")

response = co.generate(
    model="north-mini-code-1.0",
    prompt="Write a Python function to calculate Fibonacci numbers",
    max_tokens=512,
    temperature=0.2,
    stop_sequences=["```"]
)

print(response.generations[0].text)

Self-Hosting Guide

Prerequisites:

  • NVIDIA GPU with 80GB+ VRAM (H100, A100 80GB)
  • CUDA 12.0+
  • Python 3.10+

Step 1: Download Model

# Install Hugging Face CLI
pip install huggingface_hub

# Download weights
huggingface-cli download cohere/north-mini-code-1.0 \
  --local-dir ./north-mini-code \
  --include "*.safetensors" "*.json"

Step 2: Install Inference Engine

# Option A: vLLM (recommended for throughput)
pip install vllm

# Option B: TGI (Text Generation Inference)
docker pull ghcr.io/huggingface/text-generation-inference:latest

# Option C: Transformers (simple but slower)
pip install transformers accelerate

Step 3: Launch Server

# Using vLLM
python -m vllm.entrypoints.openai.api_server \
  --model ./north-mini-code \
  --dtype float8 \
  --gpu-memory-utilization 0.9 \
  --max-model-len 256000 \
  --port 8000

# Server runs at http://localhost:8000

Step 4: Query Server

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "north-mini-code",
    "prompt": "def quick_sort(arr):",
    "max_tokens": 512,
    "temperature": 0.1
  }'

OpenCode Integration

Install OpenCode:

pip install opencode-agents

Configure Backend:

# For self-hosted deployment
opencode config set backend http://localhost:8000
opencode config set model north-mini-code

# For Cohere API
opencode config set backend cohere
opencode config set api_key YOUR_COHERE_KEY
opencode config set model north-mini-code-1.0

Run Agentic Task:

# Interactive session
opencode chat

# One-shot task
opencode run "Build a Flask app with user authentication"

# With specific files
opencode run --files src/*.py "Refactor these modules to use async/await"

Community and Ecosystem

Open Development Philosophy

Cohere is building North Mini Code in the open, with community feedback shaping the roadmap:

Community Channels:

  • 🐦 Twitter/X: @CohereAI — Tag @Cohere to share builds
  • 💬 Discord: Join official Cohere Discord server
  • 🗨️ Reddit: r/CohereAI for discussions
  • 🐙 GitHub: Report issues, contribute to ecosystem tools

Feedback Priorities:

  • Benchmark performance gaps
  • Real-world use case pain points
  • Feature requests for future releases
  • Integration compatibility issues

Roadmap and Future Models

What's Next:

  • North Mini Code is the first, not the last
  • Larger models coming: More powerful variants in development
  • Specialized models: Domain-specific coding models (e.g., ML, frontend, systems)
  • Fine-tuning support: Official fine-tuning guides and tooling
  • Quantization options: 4-bit, 2-bit variants for edge deployment

Community-Driven Priorities:

  • Most-requested benchmarks will be prioritized
  • Integration gaps will be addressed
  • Documentation expanded based on common questions

Contributing to the Ecosystem

How to Help:

  1. Benchmark and Report: Run North Mini Code on your tasks, share results
  2. Build Tools: Create OpenCode plugins, integrations, workflows
  3. Write Guides: Tutorial content for common use cases
  4. Identify Issues: Report bugs, limitations, compatibility problems
  5. Showcase Projects: Share what you build to inspire others

Recognition:

  • Featured projects highlighted on Cohere blog
  • Community contributors credited in updates
  • Ecosystem tools promoted through official channels

Sources and References

Official Resources

Announcement:

Technical Documentation:

Benchmarks:

  • Artificial Analysis Coding Index: 33.4
  • Internal testing vs. Devstral Small 2 (throughput, latency)
  • Evaluations on SWE-Bench, Terminal-Bench (harness-specific)

Related Reading


North Mini Code was launched by Cohere on June 9, 2026 as their first open-source agentic coding model under Apache 2.0 license. A 30B parameter mixture-of-experts model with 3B active parameters, it delivers competitive performance on SWE-Bench and Terminal-Bench while offering 2.8x throughput advantages over similar-sized models, positioning it as a sovereign AI solution for developers requiring on-premises deployment and complete control over their AI infrastructure.

Related posts