llmfit-hardware-model-matcher

aradotso/trending-skills · updated Apr 8, 2026

$npx skills add https://github.com/aradotso/trending-skills --skill llmfit-hardware-model-matcher
0 commentsdiscussion
summary

Skill by ara.so — Daily 2026 Skills collection.

skill.md

llmfit Hardware Model Matcher

Skill by ara.so — Daily 2026 Skills collection.

llmfit detects your system's RAM, CPU, and GPU then scores hundreds of LLM models across quality, speed, fit, and context dimensions — telling you exactly which models will run well on your hardware. It ships with an interactive TUI and a CLI, supports multi-GPU, MoE architectures, dynamic quantization, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner).


Installation

macOS / Linux (Homebrew)

brew install llmfit

Quick install script

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

# Without sudo, installs to ~/.local/bin
curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

Windows (Scoop)

scoop install llmfit

Docker / Podman

docker run ghcr.io/alexsjones/llmfit

# With jq for scripting
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'

From source (Rust)

git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# binary at target/release/llmfit

Core Concepts

  • Fit tiers: perfect (runs great), good (runs well), marginal (runs but tight), too_tight (won't run)
  • Scoring dimensions: quality, speed (tok/s estimate), fit (memory headroom), context capacity
  • Run modes: GPU, CPU+GPU offload, CPU-only, MoE
  • Quantization: automatically selects best quant (e.g. Q4_K_M, Q5_K_S, mlx-4bit) for your hardware
  • Providers: Ollama, llama.cpp, MLX, Docker Model Runner

Key Commands

Launch Interactive TUI

llmfit

CLI Table Output

llmfit --cli

Show System Hardware Detection

llmfit system
llmfit --json system   # JSON output

List All Models

llmfit list

Search Models

llmfit search "llama 8b"
llmfit search "mistral"
llmfit search "qwen coding"

Fit Analysis

# All runnable models ranked by fit
llmfit fit

# Only perfect fits, top 5
llmfit fit --perfect -n 5

# JSON output
llmfit --json fit -n 10

Model Detail

llmfit info "Mistral-7B"
llmfit info "Llama-3.1-70B"

Recommendations

# Top 5 recommendations (JSON default)
llmfit recommend --json --limit 5

# Filter by use case: general, coding, reasoning, chat, multimodal, embedding
llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 5

Hardware Planning (invert: what hardware do I need?)

llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json

REST API Server (for cluster scheduling)

llmfit serve
llmfit serve --host 0.0.0.0 --port 8787

Hardware Overrides

When autodetection fails (VMs, broken nvidia-smi, passthrough setups):

# Override GPU VRAM
llmfit --memory=32G
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5
llmfit --memory=24G recommend --json

# Megabytes
llmfit --memory=32000M

# Works with any subcommand
llmfit --memory=16G info "Llama-3.1-70B"

Accepted suffixes: G/GB/GiB, M/MB/MiB, T/TB/TiB (case-insensitive).

Context Length Cap

# Estimate memory fit at 4K context
llmfit --max-context 4096 --cli

# With subcommands
llmfit --max-context 8192 fit --perfect -n 5
llmfit --max-context 16384 recommend --json --limit 5

# Environment variable alternative
export OLLAMA_CONTEXT_LENGTH=8192
llmfit recommend --json

REST API Reference

Start the server:

llmfit serve --host 0.0.0.0 --port 8787

Endpoints

# Health check
curl http://localhost:8787/health

# Node hardware info
curl http://localhost:8787/api/v1/system

# Full model list with filters
curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"

# Top runnable models for this node (key scheduling endpoint)
curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"

# Search by model name/provider
curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"

Query Parameters for /models and /models/top

Param Values Description
limit / n integer Max rows returned
min_fit perfect|good|marginal|too_tight Minimum fit tier
perfect true|false Force perfect-only
runtime any|mlx|llamacpp Filter by runtime
use_case general|coding|reasoning|chat|multimodal|embedding Use case filter
provider string Substring match on provider
search string Free-text across name/provider/size/use-case
sort score|tps|params|mem|ctx|date|use_case Sort column
include_too_tight true|false Include non-runnable models
max_context integer Per-request context cap

Scripting & Automation Examples

Bash: Get top coding models as JSON

#!/bin/bash
# Get top 3 coding models that fit perfectly
llmfit recommend --json --use-case coding --limit 3 | \
  jq -r '.models[] | "\(.name) (\(.score)) - \(.quantization)"'

Bash: Check if a specific model fits

#!/bin/bash
MODEL="Mistral-7B"
RESULT=$(llmfit info "$MODEL" --json 2>/dev/null)
FIT=$(echo "$RESULT" | jq -r '.fit')
if [[ "$FIT" == "perfect" || "$FIT" == "good" ]]; then
  echo "$MODEL will run well (fit: $FIT)"
else
  echo "$MODEL may not run well (fit: $FIT)"
fi

Bash: Auto-pull top Ollama model

#!/bin/bash
# Get the top fitting model name and pull it with Ollama
TOP_MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
echo "Pulling: $TOP_MODEL"
ollama pull "$TOP_MODEL"

Python: Query the REST API

import requests

BASE_URL = "http://localhost:8787"

def get_system_info():
    resp = requests.get(f"{BASE_URL}/api/v1/system")
    return resp.json()

def get_top_models(use_case="coding", limit=5, min_fit="good"):
    params = {
        "use_case": use_case,
        "limit": limit,
        "min_fit": min_fit,
        "sort": "score"
    }
    resp = requests.get(f"{BASE_URL}/api/v1/models/top", params=params)
    return resp.json()

def search_models(query, runtime="any"):
    resp = requests.get(
        f"{BASE_URL}/api/v1/models/{query

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.625 reviews
  • Amina Sharma· Dec 20, 2024

    Registry listing for llmfit-hardware-model-matcher matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Naina Flores· Nov 11, 2024

    Useful defaults in llmfit-hardware-model-matcher — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Ama Abbas· Oct 18, 2024

    llmfit-hardware-model-matcher fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Neel Sethi· Oct 2, 2024

    I recommend llmfit-hardware-model-matcher for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Piyush G· Sep 13, 2024

    Registry listing for llmfit-hardware-model-matcher matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Shikha Mishra· Aug 4, 2024

    llmfit-hardware-model-matcher reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Evelyn Mehta· Jul 27, 2024

    Keeps context tight: llmfit-hardware-model-matcher is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Rahul Santra· Jul 23, 2024

    I recommend llmfit-hardware-model-matcher for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Yuki Huang· Jun 18, 2024

    llmfit-hardware-model-matcher is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Pratham Ware· Jun 14, 2024

    Useful defaults in llmfit-hardware-model-matcher — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

showing 1-10 of 25

1 / 3