tag
quantization▌
3 indexed skills · max 10 per page
skills (3)
hqq-quantization
davila7/claude-code-templates · Productivity
Fast, calibration-free weight quantization supporting 8/4/3/2/1-bit precision with multiple optimized backends.
gguf-quantization
davila7/claude-code-templates · Productivity
The GGUF (GPT-Generated Unified Format) is the standard file format for llama.cpp, enabling efficient inference on CPUs, Apple Silicon, and GPUs with flexible quantization options.
awq-quantization
davila7/claude-code-templates · Productivity
4-bit quantization that preserves salient weights based on activation patterns, achieving 3x speedup with minimal accuracy loss.