July 1, 2026: With Fable 5 still offline , GLM-5.2 and Qwen 3.6 27B proving local coding is practical, the missing piece for many developers is not which model โ it is how to wire inference into a harness .
OpenCode is the open-source agent loop that accepts any OpenAI-compatible API . Run weights on your CPU/GPU , expose http://127.0.0.1:โฆ/v1, point ~/.config/opencode/opencode.jsonc at it, and you have a local coding agent โ same tools, LSP, and /init AGENTS.md flow as cloud setups, without sending repo context to a third party.
This is explainx.ai's end-to-end stack guide : pick a model โ start a server โ configure OpenCode โ tier local vs cloud.
Weekly digest 3.4k readers
Catch up on AI
Curated AI updates on agents, skills, and MCP โ delivered to your inbox. Unsubscribe anytime.
TL;DR โ the full local + OpenCode path
Architecture โ three layers
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OpenCode (agent harness) โ
โ tools ยท LSP ยท AGENTS.md ยท sessions โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ OpenAI-compatible HTTP
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โ Inference server โ
โ llama.cpp ยท Ollama ยท LM Studio ยท vLLMโ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ GGUF / safetensors
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โ Open-weight model on your disk/GPU โ
โ Qwen ยท GLM ยท Gemma ยท DeepSeek ยท โฆ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
OpenCode never loads weights itself โ it only talks HTTP. That separation is why one config file can swap Ollama today and llama.cpp tomorrow.
Step 1 โ Install OpenCode
curl -fsSL https://opencode.ai/install | bash
cd your-project
opencode
First-run inside TUI:
/connect โ for cloud providers (Z.AI, OpenRouter, Copilot OAuth)
/init โ generate AGENTS.md project memory
/models โ pick active model after providers exist
Slash reference: OpenCode commands . Harness concepts: What is an agent harness? .
Step 2 โ Pick an inference runtime
Runtime Best for OpenAI API default OpenCode fit llama.cpp Max control, MTP, Apple Silicon http://127.0.0.1:8080/v1Recommended for serious local codingOllama One-command pull/serve http://127.0.0.1:11434/v1Fastest onboarding LM Studio GUI + local server toggle http://127.0.0.1:1234/v1Non-terminal users vLLM Multi-user / production GPU box custom port /v1 Team LAN server
Codex users: same local servers work in Codex OSS mode โ different harness, identical inference layer.
Step 3 โ Start the local server
Option A โ llama.cpp (Qwen 3.6 27B example)
From the Qwen 3.6 local benchmark post :
llama-server -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0 \
--spec-type draft-mtp -ngl 999 -fa on -c 65536 --port 8080
Smoke test: open http://127.0.0.1:8080 or curl http://127.0.0.1:8080/v1/models.
Option B โ Ollama
ollama pull qwen3.6:27b
ollama serve
Other coding pulls teams use: glm-5.2, deepseek-coder-v2, gemma4:12b โ check VRAM before pulling 70B-class tags.
Option C โ LM Studio
Download model in GUI
Local Server tab โ Start Server (default 1234 )
Enable OpenAI compatible API
Option D โ vLLM (Linux GPU server)
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3.6-27B-Instruct \
--port 8000
Point OpenCode at http://127.0.0.1:8000/v1 or your LAN IP for a shared team box.
Step 4 โ Configure OpenCode (opencode.jsonc)
Config path: ~/.config/opencode/opencode.jsonc (user-wide) or project-local per OpenCode docs .
llama.cpp provider block
{
"$schema" : "https://opencode.ai/config.json" ,
"provider" : {
"llama" : {
"name" : "llama.cpp (local)" ,
"npm" : "@ai-sdk/openai-compatible" ,
"options" : {
"baseURL" : "http://127.0.0.1:8080/v1" ,
"apiKey" : "local"
} ,
"models" : {
"qwen3.6-27b" : { "name" : "Qwen3.6-27B Q8 +MTP" }
}
}
} ,
"model" : "llama/qwen3.6-27b"
}
Ollama provider block
{
"$schema" : "https://opencode.ai/config.json" ,
"provider" : {
"ollama" : {
"name" : "Ollama (local)" ,
"npm" : "@ai-sdk/openai-compatible" ,
"options" : {
"baseURL" : "http://127.0.0.1:11434/v1" ,
"apiKey" : "ollama"
} ,
"models" : {
"qwen3.6-27b" : { "name" : "qwen3.6:27b" }
}
}
} ,
"model" : "ollama/qwen3.6-27b"
}
LM Studio provider block
"options" : {
"baseURL" : "http://127.0.0.1:1234/v1" ,
"apiKey" : "lm-studio"
}
Restart OpenCode after editing config, or run opencode fresh from the project directory.
Step 5 โ Cloud + local in one config (recommended)
Post-Fable tiering :
{
"$schema" : "https://opencode.ai/config.json" ,
"provider" : {
"llama" : {
"name" : "Local Qwen" ,
"npm" : "@ai-sdk/openai-compatible" ,
"options" : {
"baseURL" : "http://127.0.0.1:8080/v1" ,
"apiKey" : "local"
} ,
"models" : {
"qwen3.6-27b" : { "name" : "Qwen3.6-27B local" }
}
} ,
"zai" : {
"name" : "GLM Coding Plan" ,
"options" : {
"baseURL" : "https://api.z.ai/api/coding/paas/v4"
}
}
} ,
"model" : "llama/qwen3.6-27b"
Connect Z.AI via /connect inside TUI if you prefer not to store API keys in jsonc.
Model picks for local OpenCode (July 2026)
Model Local fit OpenCode notes Qwen 3.6 27B dense Best balance 48GB Mac / 24GB+ NvidiaBeats MoE 35B A3B on instruction following Gemma 4 12B/31B Multimodal + Apache 2.0 Slower than Qwen for pure code GLM-5.2 Unsloth Workstation / multi-GPU Closest to frontier local DeepSeek V4 Flash (quant) High tok/s if RAM allows Aggressive quants trade quality Kimi K2.7 Usually API โ too large for most locals Use API via OpenCode /connect
Alibaba cloud line: Qwen 3.7-Max for when local is not enough.
Hardware reality check
Machine Realistic local model 32GB Apple Silicon Q4 27B or Q8 7Bโ14B 48โ64GB Apple Silicon Q8 Qwen 3.6 27B (~32 tok/s MTP) 24GB Nvidia (4090/5090) Q6 27B; Q4 70B tight 64GB+ VRAM / dual GPU GLM-5.2 class, vLLM team server
Full economics: Mac vs dedicated GPU ยท closed vs open cost table .
First session checklist
Server running โ curl -s http://127.0.0.1:8080/v1/models (or Ollama 11434)
Config saved โ ~/.config/opencode/opencode.jsonc
opencode in git repo
/init โ creates AGENTS.md
Smoke prompt โ "Create a pnpm package with a hexagonal minesweeper" (same test as Quesma Qwen post )
Verify model โ ask "What model are you?" โ should not hallucinate Claude/GPT
Add verification loops from explainx.ai loops โ e.g. ci-until-green .
Troubleshooting
Symptom Fix Connection refused Inference server not running or wrong port in baseURL Empty / garbage output Quant too low โ bump Q4โQ6/Q8 (quant guide ) Slow first token Model loading; MTP helps decode โ see llama.cpp --spec-type draft-mtp Ignores package.json / structure Try dense over MoE; shorten context if RAM swapping Tool calls fail Model may lack reliable function-calling โ switch to GLM API or GPT-class for agent-heavy runs Wrong provider echo config path; /models list; check model string matches provider/id
OpenCode vs other local harnesses
Harness Local OSS models Notes OpenCode Yes โ 75+ providers + custom localDefault open multi-surface Pi Yes โ provider plugins Minimal, ownable Codex OSS Yes โ --oss + Ollama OpenAI-native tool schema Claude Code Indirect โ /config remote to desktop Not weight-local Kilo Code Yes โ BYOK + Ollama VS Code extension
OpenCode wins when you want one harness , terminal + desktop , and swap local/cloud without reinstalling .
Related on explainx.ai
Official: OpenCode ยท OpenCode config schema ยท llama.cpp ยท Ollama
Runtime ports, model tags, and OpenCode provider schema reflect July 1, 2026 docs โ verify before production. Last updated: July 1, 2026.