Can OpenCode use fully local open-source models?

Yes. OpenCode connects to any OpenAI-compatible endpoint — Ollama (localhost:11434/v1), llama-server (localhost:8080/v1), LM Studio, or vLLM. Add a custom provider in ~/.config/opencode/opencode.jsonc with baseURL and apiKey set to local, then set "model" to your provider/model id. No code leaves your machine if the inference server runs locally.

What is the easiest local stack for OpenCode in 2026?

For beginners: Ollama pull + ollama serve, then OpenCode /connect or a two-line opencode.jsonc pointing at http://127.0.0.1:11434/v1. For maximum control and speed on Apple Silicon: llama.cpp with MTP (see Qwen 3.6 27B guide). For GUI model browsing: LM Studio local server mode.

Which open-source model should I run locally for coding with OpenCode?

Dense Qwen 3.6 27B is the current HN sweet spot for instruction-following and agent tasks on 48GB+ unified memory. GLM-5.2 via Unsloth needs heavier hardware but closer to frontier. Gemma 4 12B/31B if you need multimodal local. Match quant to VRAM — Q8 on 48GB Mac, Q4 on 32GB. See Mac vs GPU guide for tiers.

Can I mix local models and cloud APIs in OpenCode?

Yes. Define multiple providers in opencode.jsonc — local llama.cpp for private repos, Z.AI GLM Coding Plan or OpenRouter for frontier bursts. Switch with /models or edit the default "model" field. Tier workloads like enterprise Fable-alternative guides recommend: sensitive code local, peak reasoning on API.

What context window does OpenCode need for local models?

Agent coding loops benefit from 32k–64k minimum; 128k+ helps long refactors. Qwen 3.6 supports 256k native but 64k is a practical RAM default. Codex OSS mode documents 64k as a floor for agent loops — treat that as a good minimum for OpenCode local sessions too.

Run Open Source Models Locally in OpenCode (2026) | explainx.ai Blog

Q: How do I configure OpenCode for a local llama.cpp server?

Start llama-server on port 8080 with your GGUF model, then add to opencode.jsonc a provider block with npm @ai-sdk/openai-compatible, options.baseURL http://127.0.0.1:8080/v1, apiKey local, and models map. Set "model": "llama/your-model-id". Full copy-paste in this guide and the Qwen 3.6 27B post.

explainx.ainewsletter3.5k

workshops ↗

Run Open Source Models Locally in OpenCode (2026) | explainx.ai Blog | explainx.ai

July 1, 2026: With Fable 5 still offline, GLM-5.2 and Qwen 3.6 27B proving local coding is practical, the missing piece for many developers is not which model — it is how to wire inference into a harness.

OpenCode is the open-source agent loop that accepts any OpenAI-compatible API. Run weights on your CPU/GPU, expose http://127.0.0.1:…/v1, point ~/.config/opencode/opencode.jsonc at it, and you have a local coding agent — same tools, LSP, and /init AGENTS.md flow as cloud setups, without sending repo context to a third party.

This is explainx.ai's end-to-end stack guide: pick a model → start a server → configure OpenCode → tier local vs cloud.

TL;DR — the full local + OpenCode path

Step	What to do	Deep dive
1. Hardware	32GB+ for Q4 7B–27B; 48GB+ for Q8 27B	Mac vs GPU guide
2. Model	coding default; if you have GPU headroom

text

┌─────────────────────────────────────┐
│  OpenCode (agent harness)           │
│  tools · LSP · AGENTS.md · sessions │
└──────────────┬──────────────────────┘
               │ OpenAI-compatible HTTP
┌──────────────▼──────────────────────┐
│  Inference server                   │
│  llama.cpp · Ollama · LM Studio · vLLM│
└──────────────┬──────────────────────┘
               │ GGUF / safetensors
┌──────────────▼──────────────────────┐
│  Open-weight model on your disk/GPU │
│  Qwen · GLM · Gemma · DeepSeek · …  │
└─────────────────────────────────────┘

Runtime	Best for	OpenAI API default	OpenCode fit
llama.cpp	Max control, MTP, Apple Silicon	`http://127.0.0.1:8080/v1`	Recommended for serious local coding
Ollama	One-command pull/serve	`http://127.0.0.1:11434/v1`	Fastest onboarding
LM Studio	GUI + local server toggle	`http://127.0.0.1:1234/v1`	Non-terminal users
vLLM	Multi-user / production GPU box	custom port `/v1`	Team LAN server

jsonc

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "llama": {
      "name": "llama.cpp (local)",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1",
        "apiKey": "local"
      },
      "models": {
        "qwen3.6-27b": { "name": "Qwen3.6-27B Q8 +MTP" }
      }
    }
  },
  "model": "llama/qwen3.6-27b"
}

jsonc

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "name": "Ollama (local)",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:11434/v1",
        "apiKey": "ollama"
      },
      "models": {
        "qwen3.6-27b": { "name": "qwen3.6:27b" }
      }
    }
  },
  "model": "ollama/qwen3.6-27b"
}

jsonc

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "llama": {
      "name": "Local Qwen",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1",
        "apiKey": "local"
      },
      "models": {
        "qwen3.6-27b": { "name": "Qwen3.6-27B local" }
      }
    },
    "zai": {
      "name": "GLM Coding Plan",
      "options": {
        "baseURL": "https://api.z.ai/api/coding/paas/v4"
      }
    }
  },
  "model": "llama/qwen3.6-27b"
}

Workload	Switch to
Private repo, offline flight	`llama/qwen3.6-27b`
Hard refactor, long agent	`/models` → GLM-5.2 (harness guide)
Cheap volume API	Cline $9.99 open weights or OpenRouter

Model	Local fit	OpenCode notes
Qwen 3.6 27B dense	Best balance 48GB Mac / 24GB+ Nvidia	Beats MoE 35B A3B on instruction following
Gemma 4 12B/31B	Multimodal + Apache 2.0	Slower than Qwen for pure code
GLM-5.2 Unsloth	Workstation / multi-GPU	Closest to frontier local
DeepSeek V4 Flash (quant)	High tok/s if RAM allows	Aggressive quants trade quality
Kimi K2.7	Usually API — too large for most locals	Use API via OpenCode `/connect`

Machine	Realistic local model
32GB Apple Silicon	Q4 27B or Q8 7B–14B
48–64GB Apple Silicon	Q8 Qwen 3.6 27B (~32 tok/s MTP)
24GB Nvidia (4090/5090)	Q6 27B; Q4 70B tight
64GB+ VRAM / dual GPU	GLM-5.2 class, vLLM team server

Symptom	Fix
Connection refused	Inference server not running or wrong port in baseURL
Empty / garbage output	Quant too low — bump Q4→Q6/Q8 (quant guide)
Slow first token	Model loading; MTP helps decode — see llama.cpp `--spec-type draft-mtp`
Ignores package.json / structure	Try dense over MoE; shorten context if RAM swapping
Tool calls fail	Model may lack reliable function-calling — switch to GLM API or GPT-class for agent-heavy runs
Wrong provider	`echo` config path; `/models` list; check `model` string matches `provider/id`

Harness	Local OSS models	Notes
OpenCode	Yes — 75+ providers + custom local	Default open multi-surface
Pi	Yes — provider plugins	Minimal, ownable
Codex OSS	Yes — `--oss` + Ollama	OpenAI-native tool schema
Claude Code	Indirect — `/config` remote to desktop	Not weight-local
Kilo Code	Yes — BYOK + Ollama	VS Code extension

How to Run Open Source Models Locally and Wire Them Into OpenCode (2026)

TL;DR — the full local + OpenCode path

Related posts

Qwen 3.6 27B Local Dev Guide: llama.cpp, OpenCode, and Why Dense Beats MoE

LM Studio Bionic: Open-Model Agent for Code and Work Projects

What Is llama.cpp? Install, Run GGUF Models, and Serve OpenAI-Compatible APIs

Architecture — three layers

Step 1 — Install OpenCode

Step 2 — Pick an inference runtime

Step 3 — Start the local server

Option A — llama.cpp (Qwen 3.6 27B example)

Option B — Ollama

Option C — LM Studio

Option D — vLLM (Linux GPU server)

Step 4 — Configure OpenCode (`opencode.jsonc`)

llama.cpp provider block

Ollama provider block

LM Studio provider block

Step 5 — Cloud + local in one config (recommended)

Model picks for local OpenCode (July 2026)

Hardware reality check

First session checklist

Troubleshooting

OpenCode vs other local harnesses

TL;DR — the full local + OpenCode path

Related posts

Qwen 3.6 27B Local Dev Guide: llama.cpp, OpenCode, and Why Dense Beats MoE

LM Studio Bionic: Open-Model Agent for Code and Work Projects

What Is llama.cpp? Install, Run GGUF Models, and Serve OpenAI-Compatible APIs

Architecture — three layers

Step 1 — Install OpenCode

Step 2 — Pick an inference runtime

Step 3 — Start the local server

Option A — llama.cpp (Qwen 3.6 27B example)

Option B — Ollama

Option C — LM Studio

Option D — vLLM (Linux GPU server)

Step 4 — Configure OpenCode (opencode.jsonc)

llama.cpp provider block

Ollama provider block

LM Studio provider block

Step 5 — Cloud + local in one config (recommended)

Model picks for local OpenCode (July 2026)

Hardware reality check

First session checklist

Troubleshooting

OpenCode vs other local harnesses

Related on explainx.ai

Step 4 — Configure OpenCode (`opencode.jsonc`)