Technique Updated 2026-04

Model Distillation

Definition

Distillation transfers knowledge from a large model (teacher) to a smaller model (student), preserving performance at lower cost.

Frequently Asked Questions

Why distill instead of fine-tune?
Fine-tuning adapts an existing model. Distillation creates a new smaller model that mimics a larger one. The result is faster and cheaper to run.
Does DeepSeek use distillation?
Yes. DeepSeek used distillation to create compact high-performing models, contributing to their unbeatable value ratio.