Technique Updated 2026-04

Quantization

Definition

Quantization reduces the precision of numbers in an AI model to make it smaller and faster, with minimal quality loss.

Frequently Asked Questions

4-bit, 8-bit quantization, what's the difference?
Original models use 16 or 32-bit numbers. 8-bit quantization halves the size, 4-bit quarters it. A 70B LLM in 4-bit fits in 32GB of RAM.
Does quality drop significantly?
At 8-bit, barely noticeable. At 4-bit, slight drop on complex tasks but acceptable for most uses. At 2-bit, notable loss.