Confirm successful installation by checking the skill directory location:
.cursor/skills/llmfit-hardware-model-matcher
Restart Cursor to activate llmfit-hardware-model-matcher. Access via /llmfit-hardware-model-matcher in your agent's command palette.
โ
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
llmfit detects your system's RAM, CPU, and GPU then scores hundreds of LLM models across quality, speed, fit, and context dimensions โ telling you exactly which models will run well on your hardware. It ships with an interactive TUI and a CLI, supports multi-GPU, MoE architectures, dynamic quantization, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner).
Installation
macOS / Linux (Homebrew)
brew install llmfit
Quick install script
curl-fsSL https://llmfit.axjns.dev/install.sh |sh# Without sudo, installs to ~/.local/bincurl-fsSL https://llmfit.axjns.dev/install.sh |sh-s -- --local
Windows (Scoop)
scoop install llmfit
Docker / Podman
docker run ghcr.io/alexsjones/llmfit
# With jq for scriptingpodman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'
From source (Rust)
git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release# binary at target/release/llmfit
Core Concepts
Fit tiers: perfect (runs great), good (runs well), marginal (runs but tight), too_tight (won't run)
# All runnable models ranked by fitllmfit fit
# Only perfect fits, top 5llmfit fit --perfect-n5# JSON outputllmfit --json fit -n10
Model Detail
llmfit info "Mistral-7B"llmfit info "Llama-3.1-70B"
Recommendations
# Top 5 recommendations (JSON default)llmfit recommend --json--limit5# Filter by use case: general, coding, reasoning, chat, multimodal, embeddingllmfit recommend --json --use-case coding --limit3llmfit recommend --json --use-case reasoning --limit5
Hardware Planning (invert: what hardware do I need?)
llmfit plan "Qwen/Qwen3-4B-MLX-4bit"--context8192llmfit plan "Qwen/Qwen3-4B-MLX-4bit"--context8192--quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit"--context8192 --target-tps 25--jsonllmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct"--context8192--json
When autodetection fails (VMs, broken nvidia-smi, passthrough setups):
# Override GPU VRAMllmfit --memory=32G
llmfit --memory=24G --clillmfit --memory=24G fit --perfect-n5llmfit --memory=24G recommend --json# Megabytesllmfit --memory=32000M
# Works with any subcommandllmfit --memory=16G info "Llama-3.1-70B"
# Estimate memory fit at 4K contextllmfit --max-context 4096--cli# With subcommandsllmfit --max-context 8192 fit --perfect-n5llmfit --max-context 16384 recommend --json--limit5# Environment variable alternativeexportOLLAMA_CONTEXT_LENGTH=8192llmfit recommend --json
REST API Reference
Start the server:
llmfit serve --host0.0.0.0 --port8787
Endpoints
# Health checkcurl http://localhost:8787/health
# Node hardware infocurl http://localhost:8787/api/v1/system
# Full model list with filterscurl"http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"# Top runnable models for this node (key scheduling endpoint)curl"http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"# Search by model name/providercurl"http://localhost:8787/api/v1/models/Mistral?runtime=any"
#!/bin/bash# Get top 3 coding models that fit perfectlyllmfit recommend --json --use-case coding --limit3|\ jq -r'.models[] | "\(.name) (\(.score)) - \(.quantization)"'
Bash: Check if a specific model fits
#!/bin/bashMODEL="Mistral-7B"RESULT=$(llmfit info "$MODEL"--json2>/dev/null)FIT=$(echo"$RESULT"| jq -r'.fit')if[["$FIT"=="perfect"||"$FIT"=="good"]];thenecho"$MODEL will run well (fit: $FIT)"elseecho"$MODEL may not run well (fit: $FIT)"fi
Bash: Auto-pull top Ollama model
#!/bin/bash# Get the top fitting model name and pull it with OllamaTOP_MODEL=$(llmfit recommend --json--limit1| jq -r'.models[0].name')echo"Pulling: $TOP_MODEL"ollama pull "$TOP_MODEL"
โบClaude Desktop or compatible AI client with skill support
โบClear understanding of task or problem to solve
โบWillingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Steps
1Install skill using provided installation command
2Test with simple use case relevant to your work
3Evaluate output quality and relevance
4Iterate on prompts to improve results
5Integrate into regular workflow if valuable
Common Pitfalls
โ Expecting perfect results without iteration
โ Not providing enough context in prompts
โ Using skill for tasks outside its intended scope
โ Accepting outputs without review and validation
Best Practices
โ Do
+Start with clear, specific prompts
+Provide relevant context and constraints
+Review and refine all outputs before using
+Iterate to improve output quality
+Document successful prompt patterns
โ Don't
โDon't use without understanding skill limitations
โDon't skip validation of outputs
โDon't share sensitive information in prompts
โDon't expect skill to replace human judgment
๐ก Pro Tips
โ Be specific about desired format and style
โ Ask for multiple options to choose from
โ Request explanations to understand reasoning
โ Combine AI efficiency with human expertise
When to Use This
โ Use when
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
โ Avoid when
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path
1Familiarize yourself with skill capabilities and limitations
2Start with low-risk, non-critical tasks
3Progress to more complex and valuable use cases
4Build expertise through regular use and experimentation