What is Gemma Chat in one sentence?

Gemma Chat is an open-source Electron desktop app (MIT license) that runs Google’s Gemma 4 family locally on Apple Silicon using Apple’s MLX-LM stack—aimed at offline chat and a build mode where the model writes multi-file projects with a live preview, after a one-time model download.

Who built Gemma Chat and where is the code?

Ammaar Reshi published the project on GitHub at github.com/ammaarreshi/gemma-chat. Reshi’s public profile positions him in product and design around Google AI Studio; treat the repo README and issues as the source of truth for setup and known bugs.

What hardware and software does Gemma Chat require?

The upstream README targets macOS on Apple Silicon, Python 3.10–3.13, and Node 20+. Model footprint depends on variant: the project’s own table cites roughly 1.5 GB for Gemma 4 E2B, ~3 GB for E4B (recommended default), ~8 GB for a 27B-class MoE (16 GB+ RAM suggested), and ~18 GB for a 31B class (32 GB+ RAM suggested). Verify exact filenames and revisions on the repository—you should expect these numbers to move as weights and docs update.

How do I try Gemma Chat locally?

Clone github.com/ammaarreshi/gemma-chat, run npm install and npm run dev. First launch is designed to provision Python venv, MLX-LM, and download weights (~3 GB for the recommended E4B path per README messaging). For packaged installs, npm run dist produces a .dmg per project docs.

Does 'no Wi-Fi' mean the entire developer loop is offline?

Inference and conversation can run offline after models are cached, but real product workflows usually still touch the network (package installs, docs, CI, deploy previews). Treat offline coding as a strong privacy and travel story for the model layer, not a claim that every dependency and spec lives on disk forever.

What are people reporting in early community feedback?

Public threads under the launch mention interest in routing through existing local runtimes (for example Ollama or LM Studio) and occasional first-run instability during model download—check GitHub Issues before assuming your environment matches the demo path.

Gemma Chat: offline vibe coding with Gemma 4 and MLX on Mac | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Gemma Chat: offline vibe coding with Gemma 4 and MLX on Mac | explainx.ai Blog | explainx.ai

Per its README, Gemma Chat is a local-first desktop app: Electron + Vite + React 19 + TypeScript + Tailwind on the surface, MLX-LM underneath for Gemma 4 on Apple Silicon, with optional Ollama compatibility called out in the repo description. The project bills itself as “vibe code without the internet” after the initial model pull—no API keys in the local narrative, MIT license.

This article is an explainx.ai field guide: stack, model sizing, how the agent loop is described upstream, and what to validate if you fork it for your team.

TL;DR

Question	Short answer
What is it?	Desktop chat + coding agent for Gemma 4, running via MLX on Mac (Apple Silicon).
Why care?	A concrete open-source reference for offline-capable assistant UX tied to Google’s open Gemma line and Apple’s MLX runtime.
Primary source	github.com/ammaarreshi/gemma-chat
Creator signal	Ammaar Reshi—public launch thread and Google Gemma account amplification (April 2026); star/fork counts change—check the repo badge row.
License	MIT (per repository ).

Variant (as labeled upstream)	Approximate size	Notes
Gemma 4 E2B	~1.5 GB	Faster, lighter tasks
Gemma 4 E4B	~3 GB	Recommended balance in README
Gemma 4 27B MoE	~8 GB	Stronger reasoning; 16 GB+ RAM class machine
Gemma 4 31B	~18 GB	Heaviest; 32 GB+ RAM class machine

Mac model class	Recommended variant	Notes
M1 / M2 (8 GB unified)	E2B (1.5 GB)	E4B can run but may swap under load; avoid 27B/31B
M1 Pro / M2 Pro (16 GB)	E4B (3 GB) or 27B MoE (8 GB)	Comfortable for typical sessions; 27B is usable
M1 Max / M2 Max (32 GB+)	27B MoE or 31B	Full capability; watch for thermal throttling on long runs
M3 / M3 Pro / M3 Max	Same as M1/M2 equivalent	Improved efficiency may help sustained throughput

Dimension	Gemma Chat	Ollama	LM Studio	DIY MLX-LM
UI paradigm	Electron app, build + chat modes	CLI-first, server-oriented	Desktop GUI, model library	Script or notebook
Model scope	Gemma 4 family (opinionated)	Broad model zoo	Broad model library	Any MLX-compatible weights
Setup complexity	Medium (npm + Python venv + download)	Low (single binary)	Low (installer)	High (manual dependencies)
Tool/workflow integration	Built-in build mode, live preview	MCP and external tool friendly	Plugin ecosystem	Fully custom
Update cadence	Depends on maintainer activity	Frequent, vendor-backed	Frequent, commercial support	You own it

Mac configuration	Reported TPS range	Notes
M1, 8 GB	8–15 TPS	Acceptable for chat; slower for long code generation
M2, 16 GB	12–20 TPS	Comfortable for most workflows
M1 Pro, 16 GB	15–25 TPS	Good balance of speed and responsiveness
M3 Max, 32 GB+ (27B model)	10–18 TPS	Larger model trades throughput for quality

Gemma Chat: offline vibe coding with Gemma 4 and MLX on Mac

TL;DR

Related posts

MacBook vs dedicated GPU for local LLMs: how much RAM you really get, and when each wins in 2026

Ollama 0.31: Gemma 4 Is ~90% Faster on Apple Silicon With Multi-Token Prediction (No Output Change)

Cisco Antares: Open-Weight SLMs for Vulnerability Localization

What shipped

How the agent loop is described

Models and memory (from upstream table)

Getting started (upstream commands)

First-run experience and what to expect

Hardware reality check

Real-world use: when offline vibe coding actually helps

Where local-first wins

Where you still need the network

Comparing Gemma Chat to other local LLM stacks

Tradeoffs practitioners are already naming

Extending and forking: what teams should know

Custom model variants

Tool and MCP integration

Workspace sandboxing

Deployment to teams

Performance benchmarks and real-world speed

Tokens per second (approximate, E4B variant)

Build mode preview lag

Security and privacy: what the local story means

What you get

What you don't automatically get

Future directions and community roadmap

Why explainx.ai readers should care

Sources

TL;DR

Related posts

MacBook vs dedicated GPU for local LLMs: how much RAM you really get, and when each wins in 2026

Ollama 0.31: Gemma 4 Is ~90% Faster on Apple Silicon With Multi-Token Prediction (No Output Change)

Cisco Antares: Open-Weight SLMs for Vulnerability Localization

What shipped

How the agent loop is described

Models and memory (from upstream table)

Getting started (upstream commands)

First-run experience and what to expect

Hardware reality check

Real-world use: when offline vibe coding actually helps

Where local-first wins

Where you still need the network

Comparing Gemma Chat to other local LLM stacks

Tradeoffs practitioners are already naming

Extending and forking: what teams should know

Custom model variants

Tool and MCP integration

Workspace sandboxing

Deployment to teams

Performance benchmarks and real-world speed

Tokens per second (approximate, E4B variant)

Build mode preview lag

Security and privacy: what the local story means

What you get

What you don't automatically get

Future directions and community roadmap

Why explainx.ai readers should care

Related on explainx.ai

Sources