llmfit-hardware-model-matcher▌
aradotso/trending-skills · updated Apr 8, 2026
Skill by ara.so — Daily 2026 Skills collection.
llmfit Hardware Model Matcher
Skill by ara.so — Daily 2026 Skills collection.
llmfit detects your system's RAM, CPU, and GPU then scores hundreds of LLM models across quality, speed, fit, and context dimensions — telling you exactly which models will run well on your hardware. It ships with an interactive TUI and a CLI, supports multi-GPU, MoE architectures, dynamic quantization, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner).
Installation
macOS / Linux (Homebrew)
brew install llmfit
Quick install script
curl -fsSL https://llmfit.axjns.dev/install.sh | sh
# Without sudo, installs to ~/.local/bin
curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local
Windows (Scoop)
scoop install llmfit
Docker / Podman
docker run ghcr.io/alexsjones/llmfit
# With jq for scripting
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'
From source (Rust)
git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# binary at target/release/llmfit
Core Concepts
- Fit tiers:
perfect(runs great),good(runs well),marginal(runs but tight),too_tight(won't run) - Scoring dimensions: quality, speed (tok/s estimate), fit (memory headroom), context capacity
- Run modes: GPU, CPU+GPU offload, CPU-only, MoE
- Quantization: automatically selects best quant (e.g. Q4_K_M, Q5_K_S, mlx-4bit) for your hardware
- Providers: Ollama, llama.cpp, MLX, Docker Model Runner
Key Commands
Launch Interactive TUI
llmfit
CLI Table Output
llmfit --cli
Show System Hardware Detection
llmfit system
llmfit --json system # JSON output
List All Models
llmfit list
Search Models
llmfit search "llama 8b"
llmfit search "mistral"
llmfit search "qwen coding"
Fit Analysis
# All runnable models ranked by fit
llmfit fit
# Only perfect fits, top 5
llmfit fit --perfect -n 5
# JSON output
llmfit --json fit -n 10
Model Detail
llmfit info "Mistral-7B"
llmfit info "Llama-3.1-70B"
Recommendations
# Top 5 recommendations (JSON default)
llmfit recommend --json --limit 5
# Filter by use case: general, coding, reasoning, chat, multimodal, embedding
llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 5
Hardware Planning (invert: what hardware do I need?)
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json
REST API Server (for cluster scheduling)
llmfit serve
llmfit serve --host 0.0.0.0 --port 8787
Hardware Overrides
When autodetection fails (VMs, broken nvidia-smi, passthrough setups):
# Override GPU VRAM
llmfit --memory=32G
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5
llmfit --memory=24G recommend --json
# Megabytes
llmfit --memory=32000M
# Works with any subcommand
llmfit --memory=16G info "Llama-3.1-70B"
Accepted suffixes: G/GB/GiB, M/MB/MiB, T/TB/TiB (case-insensitive).
Context Length Cap
# Estimate memory fit at 4K context
llmfit --max-context 4096 --cli
# With subcommands
llmfit --max-context 8192 fit --perfect -n 5
llmfit --max-context 16384 recommend --json --limit 5
# Environment variable alternative
export OLLAMA_CONTEXT_LENGTH=8192
llmfit recommend --json
REST API Reference
Start the server:
llmfit serve --host 0.0.0.0 --port 8787
Endpoints
# Health check
curl http://localhost:8787/health
# Node hardware info
curl http://localhost:8787/api/v1/system
# Full model list with filters
curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"
# Top runnable models for this node (key scheduling endpoint)
curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"
# Search by model name/provider
curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"
Query Parameters for /models and /models/top
| Param | Values | Description |
|---|---|---|
limit / n |
integer | Max rows returned |
min_fit |
perfect|good|marginal|too_tight |
Minimum fit tier |
perfect |
true|false |
Force perfect-only |
runtime |
any|mlx|llamacpp |
Filter by runtime |
use_case |
general|coding|reasoning|chat|multimodal|embedding |
Use case filter |
provider |
string | Substring match on provider |
search |
string | Free-text across name/provider/size/use-case |
sort |
score|tps|params|mem|ctx|date|use_case |
Sort column |
include_too_tight |
true|false |
Include non-runnable models |
max_context |
integer | Per-request context cap |
Scripting & Automation Examples
Bash: Get top coding models as JSON
#!/bin/bash
# Get top 3 coding models that fit perfectly
llmfit recommend --json --use-case coding --limit 3 | \
jq -r '.models[] | "\(.name) (\(.score)) - \(.quantization)"'
Bash: Check if a specific model fits
#!/bin/bash
MODEL="Mistral-7B"
RESULT=$(llmfit info "$MODEL" --json 2>/dev/null)
FIT=$(echo "$RESULT" | jq -r '.fit')
if [[ "$FIT" == "perfect" || "$FIT" == "good" ]]; then
echo "$MODEL will run well (fit: $FIT)"
else
echo "$MODEL may not run well (fit: $FIT)"
fi
Bash: Auto-pull top Ollama model
#!/bin/bash
# Get the top fitting model name and pull it with Ollama
TOP_MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
echo "Pulling: $TOP_MODEL"
ollama pull "$TOP_MODEL"
Python: Query the REST API
import requests
BASE_URL = "http://localhost:8787"
def get_system_info():
resp = requests.get(f"{BASE_URL}/api/v1/system")
return resp.json()
def get_top_models(use_case="coding", limit=5, min_fit="good"):
params = {
"use_case": use_case,
"limit": limit,
"min_fit": min_fit,
"sort": "score"
}
resp = requests.get(f"{BASE_URL}/api/v1/models/top", params=params)
return resp.json()
def search_models(query, runtime="any"):
resp = requests.get(
f"{BASE_URL}/api/v1/models/{queryDiscussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
general reviewsRatings
4.6★★★★★25 reviews- ★★★★★Amina Sharma· Dec 20, 2024
Registry listing for llmfit-hardware-model-matcher matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Naina Flores· Nov 11, 2024
Useful defaults in llmfit-hardware-model-matcher — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Ama Abbas· Oct 18, 2024
llmfit-hardware-model-matcher fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Neel Sethi· Oct 2, 2024
I recommend llmfit-hardware-model-matcher for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Piyush G· Sep 13, 2024
Registry listing for llmfit-hardware-model-matcher matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Shikha Mishra· Aug 4, 2024
llmfit-hardware-model-matcher reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Evelyn Mehta· Jul 27, 2024
Keeps context tight: llmfit-hardware-model-matcher is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Rahul Santra· Jul 23, 2024
I recommend llmfit-hardware-model-matcher for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Yuki Huang· Jun 18, 2024
llmfit-hardware-model-matcher is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Pratham Ware· Jun 14, 2024
Useful defaults in llmfit-hardware-model-matcher — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
showing 1-10 of 25
1 / 3