Mergekit Model Merging

Mergekit is the definitive toolkit for merging pretrained large language models. With 5K+ GitHub stars, it implements every major model merging algorithm — SLERP, TIES, DARE, DARE-TIES, MoE merging, and more — enabling you to create powerful new models without any training data or GPU training time.

circle-check

What is Mergekit?

Model merging is a powerful technique that combines the strengths of multiple LLMs into a single model:

  • No training required — merge happens in weight space, not through backprop

  • Combine capabilities — blend a coding model with an instruction-following model

  • Reduce weaknesses — average out individual model failures across an ensemble

  • Create Mixture of Experts — combine models into a sparse MoE architecture

  • Domain adaptation — merge base model with domain-specialized models

Mergekit implements all state-of-the-art algorithms:

Algorithm
Description
Best For

SLERP

Spherical linear interpolation between two models

Smooth blending of two similar models

TIES

Trim redundant parameters, elect signs, merge

Combining multiple models with minimal interference

DARE

Drop and rescale random parameters

Reducing parameter interference in large merges

DARE-TIES

DARE + TIES combined

Best all-around for multi-model merges

Linear

Simple weighted average

Quick baseline merges

Task Arithmetic

Add/subtract task vectors

Adding/removing specific capabilities

Passthrough

Copy layers directly

MoE construction

circle-info

Model merging is surprisingly effective. Merged models often outperform their parents on benchmarks by combining complementary knowledge. The MergeKit community on HuggingFace hosts thousands of merged models.


Server Requirements

Component
Minimum
Recommended

GPU

Not required (CPU merge possible)

A100 40 GB for large models

VRAM

80 GB for 70B model merges

RAM

32 GB

64 GB+ (models load into RAM)

CPU

8 cores

16+ cores

Storage

100 GB

500 GB+

OS

Ubuntu 20.04+

Ubuntu 22.04

Python

3.10+

3.11

circle-exclamation

Ports

Port
Service
Notes

22

SSH

Terminal access and file transfer

Mergekit runs as a command-line tool — no web server needed.


Installation on Clore.ai

Step 1 — Rent a Server

  1. Filter for RAM ≥ 64 GB (critical for large model merges)

  2. Choose Storage ≥ 500 GB (merged models need space for 2-4 input models + output)

  3. GPU is optional but useful if you want to test the merged model afterward

  4. Open port 22 only

Step 2 — Connect via SSH

Step 3 — Install Python Environment

Step 4 — Install Mergekit

Step 5 — Install HuggingFace CLI

Step 6 — Verify Installation


Downloading Models to Merge


Merge Configurations

Mergekit uses YAML configuration files to define merges.

Example 1: SLERP Merge (Two Models)

SLERP blends two models along a spherical arc — best for models of the same architecture:

Example 2: TIES Merge (Multiple Models)

TIES handles interference between multiple merged models:

Example 3: DARE-TIES Merge (Best All-Around)

Example 4: Task Arithmetic (Add Capabilities)

Add a "skill delta" to a base model:

Example 5: MoE (Mixture of Experts)

Combine models into a sparse MoE architecture:


Running the Merge

Basic Command

Monitor Progress


Testing the Merged Model


Publishing to HuggingFace


Advanced: Evolutionary Merge

Use Mergekit's evolutionary optimizer to find optimal merge weights:


Troubleshooting

Out of Memory (OOM) during merge

ValueError: models are not compatible

Merge is very slow

Merged model produces gibberish

FileNotFoundError for model files


General Assistant + Coding

Multilingual Boost



Clore.ai GPU Recommendations

Use Case
Recommended GPU
Est. Cost on Clore.ai

Development/Testing

RTX 3090 (24GB)

~$0.12/gpu/hr

Model Merging (7B–13B)

RTX 4090 (24GB)

~$0.70/gpu/hr

Large Models (70B+)

A100 80GB

~$1.20/gpu/hr

Multi-GPU Merging

2-4x A100 80GB

~$2.40–$4.80/hr

💡 All examples in this guide can be deployed on Clore.aiarrow-up-right GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.

Last updated

Was this helpful?