Technique Aktualisiert 2026-04

Quantization

Definition

Quantization reduces the precision of numbers in an AI model to make it smaller and faster, with minimal quality loss.

Siehe auch im Glossar

LLM (Large Language Model)

An LLM is an AI model trained on billions of texts, capable of understanding and generating human language.

Inference is the process of using a trained AI model to generate predictions or responses from new data.

SLM (Small Language Model)

An SLM is a compact language model optimized to run on local devices with targeted performance on specific tasks.

GPU Cloud provides on-demand graphics processors for training and running AI models without hardware investment.

Tools, die quantization verwenden

Das chinesische Open-Source-Modell auf GPT-4-Niveau

Stable Diffusion

Die Open-Source-Referenz für KI-Bildgenerierung

Der Open-Source-KI-Agent, der Ihre LLMs in autonome Arbeiter verwandelt

Cloud-IDE mit integrierter KI für das Programmieren von überall

Häufig gestellte Fragen

4-bit, 8-bit quantization, what's the difference?

Original models use 16 or 32-bit numbers. 8-bit quantization halves the size, 4-bit quarters it. A 70B LLM in 4-bit fits in 32GB of RAM.

Does quality drop significantly?

At 8-bit, barely noticeable. At 4-bit, slight drop on complex tasks but acceptable for most uses. At 2-bit, notable loss.