model-merging

davila7/claude-code-templates · updated Apr 8, 2026

$npx skills add https://github.com/davila7/claude-code-templates --skill model-merging
0 commentsdiscussion
summary

Use Model Merging when you need to:

skill.md

Model Merging: Combining Pre-trained Models

When to Use This Skill

Use Model Merging when you need to:

  • Combine capabilities from multiple fine-tuned models without retraining
  • Create specialized models by blending domain-specific expertise (math + coding + chat)
  • Improve performance beyond single models (often +5-10% on benchmarks)
  • Reduce training costs - no GPUs needed, merges run on CPU
  • Experiment rapidly - create new model variants in minutes, not days
  • Preserve multiple skills - merge without catastrophic forgetting

Success Stories: Marcoro14-7B-slerp (best on Open LLM Leaderboard 02/2024), many top HuggingFace models use merging

Tools: mergekit (Arcee AI), LazyMergekit, Model Soup

Installation

# Install mergekit
git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .

# Or via pip
pip install mergekit

# Optional: Transformer library
pip install transformers torch

Quick Start

Simple Linear Merge

# config.yml - Merge two models with equal weights
merge_method: linear
models:
  - model: mistralai/Mistral-7B-v0.1
    parameters:
      weight: 0.5
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      weight: 0.5
dtype: bfloat16
# Run merge
mergekit-yaml config.yml ./merged-model --cuda

# Use merged model
python -m transformers.models.auto --model_name_or_path ./merged-model

SLERP Merge (Best for 2 Models)

# config.yml - Spherical interpolation
merge_method: slerp
slices:
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [0, 32]
      - model: teknium/OpenHermes-2.5-Mistral-7B
        layer_range: [0, 32]
parameters:
  t: 0.5  # Interpolation factor (0=model1, 1=model2)
dtype: bfloat16

Core Concepts

1. Merge Methods

Linear (Model Soup)

  • Simple weighted average of parameters
  • Fast, works well for similar models
  • Can merge 2+ models
merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weights
# where w1 + w2 + w3 = 1

SLERP (Spherical Linear Interpolation)

  • Interpolates along sphere in weight space
  • Preserves magnitude of weight vectors
  • Best for merging 2 models
  • Smoother than linear
# SLERP formula
merged = (sin((1-t)*θ) / sin(θ)) * model1 + (sin(t*θ) / sin(θ)) * model2
# where θ = arccos(dot(model1, model2))
# t ∈ [0, 1]

Task Arithmetic

  • Extract "task vectors" (fine-tuned - base)
  • Combine task vectors, add to base
  • Good for merging multiple specialized models
# Task vector
task_vector = finetuned_model - base_model

# Merge multiple task vectors
merged = base_model + α₁*task_vector₁ + α₂*task_vector₂

TIES-Merging

  • Task arithmetic + sparsification
  • Resolves sign conflicts in parameters
  • Best for merging many task-specific models

DARE (Drop And REscale)

  • Randomly drops fine-tuned parameters
  • Rescales remaining parameters
  • Reduces redundancy, maintains performance

2. Configuration Structure

# Basic structure
merge_method: <method>  # linear, slerp, ties, dare_ties, task_arithmetic
base_model: <path>      # Optional: base model for task arithmetic

models:
  - model: <path/to/model1>
    parameters:
      weight: <float>   # Merge weight
      density: <float>  # For TIES/DARE

  - model: <path/to/model2>
    parameters:
      weight: <float>

parameters:
  # Method-specific parameters

dtype: <dtype>  # bfloat16, float16, float32

# Optional
slices:  # Layer-wise merging
tokenizer:  # Tokenizer configuration

Merge Methods Guide

Linear Merge

Best for: Simple model combinations, equal weighting

merge_method: linear
models:
  - model: WizardLM/WizardMath-7B-V1.1
    parameters:
      weight: 0.4
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      weight: 0.3
  - model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
    parameters:
      weight: 0.3
dtype: bfloat16

SLERP Merge

Best for: Two models, smooth interpolation

merge_method: slerp
slices:
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [0, 32]
      - model: teknium/OpenHermes-2.5-Mistral-7B
        layer_range: [0, 32]
parameters:
  t: 0.5  # 0.0 = first model, 1.0 = second model
dtype: bfloat16

Layer-specific SLERP:

merge_method: slerp
slices:
  - sources:
      - model: model_a
        layer_range: [0, 32]
      - model: model_b
        layer_range: [0, 32]
parameters:
  t:
    - filter: self_attn    # Attention layers
      value: 0.3
    - filter: mlp          # MLP layers
      value: 0.7
    - value: 0.5           # Default for other layers
dtype: bfloat16

Task Arithmetic

Best for: Combining specialized skills

merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
  - model: WizardLM/WizardMath-7B-V1.1  # Math
    parameters:
      weight: 0.5
  - model: teknium/OpenHermes-2.5-Mistral-7B  # Chat
    parameters:
      weight: 0.3
  - model: ajibawa-2023/Code-Mistral-7B  # Code
    parameters:
      weight: 0.2
dtype: bfloat16

TIES-Merging

Best for: Many models, resolving conflicts

merge_method: ties
base_mode

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.540 reviews
  • Kiara Dixit· Dec 28, 2024

    We added model-merging from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Michael Nasser· Dec 24, 2024

    Keeps context tight: model-merging is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Ganesh Mohane· Dec 12, 2024

    Keeps context tight: model-merging is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Kiara Khanna· Dec 4, 2024

    I recommend model-merging for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Hiroshi Verma· Nov 19, 2024

    model-merging reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Michael Patel· Nov 15, 2024

    model-merging has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Sakshi Patil· Nov 3, 2024

    model-merging has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Chaitanya Patil· Oct 22, 2024

    Solid pick for teams standardizing on skills: model-merging is focused, and the summary matches what you get after install.

  • Neel Kim· Oct 10, 2024

    Registry listing for model-merging matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Naina Johnson· Oct 6, 2024

    Solid pick for teams standardizing on skills: model-merging is focused, and the summary matches what you get after install.

showing 1-10 of 40

1 / 4