explainx.ainewsletter3.4k
trending🔥loopsskills
pricing
workshops ↗
explainx.ai

Learn to lead teams that combine humans and agents. Platform access, live workshops, bootcamps, and 50+ courses — plus skills, tools, and MCP to practice what you learn.

follow us

custom AI agents

[email protected]

get started

Join · $29/mo

learn

platform · $29/moworkshopsbootcampscoursescertificationscertification testsexplainx universitycorporate trainingfacilitatorshackathonslearn skills & mcp

discover

skillstoolsagentsmcp serversdesignsllmsagiranks

content

releasesvisionmissionaboutcommunityteamcareersresourcespromptsgenerators hubgenerator SEO hubprompt templatesprompt guidesblogfor LLMsdemo

Sister Products

Infloq

Infloq

Influencer marketing

BgBlur

BgBlur

Privacy-first blur

Olly Social

Olly Social

Social AI copilot

Ceptory

Ceptory

Video intelligence

BgRemover

BgRemover

Background removal

newsletter · weekly

Get AI news, tools, and insights in your inbox.

contactsupportprivacytermsdata rightssubmission guidelines

© 2026 AISOLO Technologies Pvt Ltd

← Back to blog

explainx / blog

LFM2.5-230M: Liquid AI's 230M Model Built to Run Agents on Phones and Robots

Liquid AI released LFM2.5-230M on June 25, 2026 — a 230M-parameter open-weight model with 213 tok/s on Galaxy S25 Ultra and 42 tok/s on Raspberry Pi 5. Built for tool use, data extraction, and on-device agentic workloads.

Jun 26, 2026·10 min read·Yash Thakker
Liquid AIEdge AIOn-Device AISmall ModelsRoboticsOpen Source
LFM2.5-230M: Liquid AI's 230M Model Built to Run Agents on Phones and Robots

On June 25, 2026, Liquid AI released LFM2.5-230M — its smallest foundation model yet, and one of the clearest 2026 statements about where the edge-AI market is heading: not bigger models in the cloud, but fast, open-weight models that run agentic tool loops on the device you already have.

Liquid AI's framing on X (@liquidai) and in the official blog post is explicit: LFM2.5-230M is built to run anywhere — cloud GPUs, phone CPUs, Raspberry Pi boards, and robot onboard computers — and to power data extraction pipelines and lightweight on-device agentic workloads, not frontier math or long-form creative writing.

newsletter3.4k

Curated AI updates on agents, skills, and MCP — delivered to your inbox. Unsubscribe anytime.


TL;DR

SpecLFM2.5-230M
Parameters230M (smallest in LFM2.5 family)
ArchitectureLFM2 (Liquid Foundation Model v2)
Pre-training19T tokens + 32K context extension
Post-trainingSFT (distilled from LFM2.5-350M) → DPO → multi-domain RL
VariantsLFM2.5-230M-Base, LFM2.5-230M (post-trained)
AvailabilityHugging Face (open-weight)
Phone CPU decode213 tok/s (Galaxy S25 Ultra, Snapdragon Gen4)
Pi 5 CPU decode42 tok/s (Raspberry Pi 5)
Best forTool use, data extraction, instruction following
Avoid forAdvanced math, code generation, creative writing
Inferencellama.cpp, MLX, vLLM, SGLang, ONNX

Why Liquid AI Built a 230M Model

The small-model landscape in mid-2026 splits into two camps:

  1. Reasoning specialists — models like VibeThinker-3B that compress verifiable math and coding into compact parameter counts.
  2. Edge agents — models optimized for speed, tool calling, and structured extraction on constrained hardware.

LFM2.5-230M is firmly in the second camp. Liquid AI is not trying to beat Claude Fable 5 on SWE-Bench. It is trying to make "hold still for 2 seconds, then walk forward at 1 meter per second" parse into a valid multi-step robot skill plan — on a Jetson Orin, with no cloud round-trip.

That use case — natural language → structured tool calls → physical action — is the same pattern emerging across home automation, industrial IoT, and phone-based agents. The bottleneck is not raw IQ. It is latency, memory footprint, and inference cost per tool loop.


Training Recipe

Liquid AI's post-training pipeline is designed to preserve downstream fine-tuning flexibility while shipping strong default capability:

Stage 1: Supervised fine-tuning with distillation

The 230M model learns from LFM2.5-350M — a larger sibling in the same architecture family. Distillation from a bigger in-family model is a proven pattern for small models: the teacher provides richer supervision signals than raw pre-training alone, without requiring the student to match the teacher on every task class.

Stage 2: Direct preference optimization (DPO)

DPO aligns the model with human-preferred outputs without a separate reward model training loop — lighter-weight than classic RLHF for a model this size.

Stage 3: Multi-domain reinforcement learning

RL across multiple domains pushes tool-use and extraction behavior beyond what SFT alone achieves — similar in spirit to the multi-domain RL stage in other 2026 small-model pipelines, but tuned for applied tasks rather than competition math.

The base checkpoint (LFM2.5-230M-Base) skips post-training for developers who want a clean starting point for domain-specific fine-tunes.


Benchmarks: Beats Models Twice Its Size — on the Right Tasks

Liquid AI evaluated LFM2.5-230M across ten benchmarks. The headline from the blog post: despite 230M parameters, it competes with and often beats models more than twice as large on instruction following, data extraction, and tool use.

Knowledge and instruction following

ModelGPQA DiamondMMLU-ProIFEvalIFBenchMulti-IF
LFM2.5-230M25.4120.2571.7138.4037.70
LFM2.5-350M30.6420.0176.9640.6944.92
LFM2-350M27.5819.2964.9618.2032.92
Granite 4.0-H-350M22.3213.1461.2717.2228.70
Qwen3.5-0.8B (Instruct)27.4137.4259.9422.8741.68
Gemma 3 1B IT23.8914.0463.4920.3344.25

On IFEval and IFBench, LFM2.5-230M leads Gemma 3 1B and Qwen3.5-0.8B despite being 3–4× smaller. On broad knowledge (MMLU-Pro), Qwen3.5-0.8B still wins — consistent with the Parametric Compression-Coverage pattern: knowledge coverage scales with parameters differently than instruction-following discipline.

Tool use and data extraction

ModelCaseReportBenchBFCLv3BFCLv4τ²-Bench Telecomτ²-Bench Retail
LFM2.5-230M22.5143.2621.035.2613.68
LFM2.5-350M32.4544.1121.8618.8617.84
LFM2-350M11.6722.9512.2910.825.56
Granite 4.0-H-350M12.4443.0713.2813.746.14
Qwen3.5-0.8B (Instruct)13.8335.0818.7012.576.14

BFCLv3 (Berkeley Function Calling Leaderboard) scores above 43 put LFM2.5-230M in the same tier as Granite 4.0-H-350M — a model with ~50% more parameters. CaseReportBench (structured medical/clinical data extraction) at 22.51 beats Qwen3.5-0.8B (13.83) and LFM2-350M (11.67) by wide margins.

The τ²-Bench telecom scores are low across the board for 230M — multi-turn customer-service simulation is hard at this scale. Retail is relatively stronger (13.68), suggesting the model handles simpler structured agent scenarios better than long conversational tool chains.


CPU Speed: 213 tok/s on a Phone, 42 tok/s on a Pi

Raw benchmark scores matter less if inference is too slow for real-time agents. Liquid AI's CPU numbers are the release's most practical signal:

PlatformHardwareDecode throughput
Samsung Galaxy S25 UltraQualcomm Snapdragon Gen4 (CPU)213 tok/s
Raspberry Pi 5ARM CPU42 tok/s

Liquid AI compares LFM2.5-230M against similar-sized attention-based and hybrid models (SSM hybrids, Gated Delta Networks) and reports the highest prefill and decode throughput in its class with the smallest memory footprint.

Flash-attention tuning is device-specific: enabled (-fa 1) on Raspberry Pi 5, disabled (-fa 0) on Snapdragon Gen4 — a reminder that edge deployment is as much about per-platform tuning as model selection. See our quantization guide for the broader stack of techniques that make sub-billion models viable on consumer hardware.


Inference Ecosystem: Day-One Support

LFM2.5-230M ships with checkpoints across the edge-to-cloud inference stack:

RuntimeUse case
llama.cppGGUF checkpoints for Raspberry Pi, phones, embedded
MLXApple Silicon (Mac, iPhone via future MLX ports)
vLLM / SGLangGPU-accelerated production serving
ONNXCross-platform deployment across diverse accelerators

For production GPU serving, Liquid AI also benchmarks an internal inference stack against SGLang-served competitors — reporting lower end-to-end latency across concurrency levels for LFM2.5 models.


Unitree G1 Demo: Natural Language → Robot Skills

The most visually compelling demo in the release is not a benchmark table — it is a Unitree G1 humanoid robot running LFM2.5-230M entirely on-device on its onboard NVIDIA Jetson Orin.

The architecture:

  1. User speaks a free-form natural-language command.
  2. LFM2.5-230M (after a quick fine-tune) acts as a skill-selection layer.
  3. The model decomposes the instruction into a sequence of tool calls.
  4. Each tool call invokes a pre-trained low-level skill from NVIDIA's SONIC framework — timed walking, velocity targets, one-legged kneel holds, etc.

Example command from Liquid AI's blog:

"Hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters, hold a forward one-leg kneel for 5 seconds, and walk backward at 0.5 meters per second for 3 meters"

The model outputs a structured multi-step plan chaining skills like timed walking and kneel holds — without cloud inference.

This parallels the Gemma 4 + Open Duck Mini demo at Google I/O 2026 — but with a different model class: 230M parameters focused on tool decomposition rather than 2B multimodal conversation. Both demos point to the same product direction — robots and edge devices need a language-to-action compiler, not a chatbot.

Liquid AI's demo video: YouTube Shorts — Unitree G1 + LFM2.5-230M


Where LFM2.5-230M Fits in the Small-Model Landscape

ModelParamsStrengthWeakness
LFM2.5-230M230MSpeed, tool use, extraction, edge agentsMath, code, creative writing
MiniCPM5-1B1BBroad open-model intelligence at 0.5GBHeavier than 230M for pure tool loops
VibeThinker-3B3BAIME 94.3, frontier verifiable reasoningToo large for Pi-class real-time agents
Gemma 4 E2B2BMultimodal on-device (vision + speech)Different deployment path (LiteRT)

Liquid AI's honest limitation statement is refreshing: do not use LFM2.5-230M for advanced math, code generation, or creative writing. That clarity helps developers route tasks correctly — use a 230M model for the tool-selection layer in a pipeline, and call a larger model (or cloud API) only when the subtask requires it.

For agentic coding on developer machines, models like Claude Opus 4.8 or OpenRouter Fusion remain the practical choice while Fable 5 stays suspended. LFM2.5-230M targets a different surface: phones, robots, home automation, and high-volume extraction pipelines where cost and latency dominate.


Get Started

Both checkpoints are available now:

  • LFM2.5-230M — post-trained, ready for tool-use and extraction workloads
  • LFM2.5-230M-Base — pre-trained base for custom fine-tuning

Download from Hugging Face and follow Liquid AI's documentation for local run and fine-tune instructions.

Liquid AI's broader LFM2.5 family spans base models, audio variants, and vision variants under one architecture — positioning the company as an efficiency-first alternative to scaling-parameter frontier labs.


Related ExplainX coverage

PostConnection
Gemma 4 + Open Duck MiniOn-device robot demo on Pi 5 and Jetson Orin
MiniCPM5-1BAnother open small-model breakthrough at sub-1B scale
VibeThinker-3BOpposite end: frontier reasoning in a compact model
AI Model Quantization GuideHow sub-billion models run on phones and edge boards
NVIDIA N1X at Computex 2026On-device AI compute trend on consumer hardware

Summary

LFM2.5-230M is Liquid AI's bet that the next wave of useful AI is not another 100B-parameter cloud model — it is a 230M-parameter open-weight agent that runs at 213 tok/s on a phone CPU, parses natural language into tool calls on a humanoid robot, and beats models twice its size on instruction following and data extraction.

It is explicitly not a reasoning or coding model. It is an edge agent compiler — fast, small, and deployable everywhere from a Raspberry Pi to a Jetson Orin to a Snapdragon phone.


Last updated: June 26, 2026. Specs and benchmarks sourced from Liquid AI's blog post and @liquidai on X, published June 25, 2026.

Related posts

Jun 15, 2026

Gemma 4 Powers Open Duck Mini: Meet Autumn, the On-Device AI Robot Duck

At Google I/O 2026, two tiny bipedal robot ducks showcased Gemma 4 E2B running fully on-device—one on a Raspberry Pi 5, one on a Jetson Orin Nano—using multimodal inputs to see, hear, and speak in real time.

May 26, 2026

MiniCPM5-1B: The Tiny 1B Model That's Crushing 2B+ AI Models

MiniCPM5 1B: The 0.5GB AI Model That Shouldn't Be This Good TL;DR : Tsinghua researchers just released MiniCPM5 1B, a 1 billion parameter model that tops open source AI charts while fitting

Jun 26, 2026

MinerU 3.4: PDF and Office Parsing for LLM, RAG, and Agent Workflows

OpenDataLab's MinerU turns PDFs and Office docs into LLM-ready Markdown and JSON. Version 3.4 ships PP-OCRv6, ~100% faster OCR, auto model-source selection, and 95%+ accuracy on hybrid backends — the default doc stack for RAG.