Modèle Aktualisiert 2026-04

Mixture of Experts (MoE)

Mixture of Experts
Definition

MoE is a model architecture that activates only a fraction of its parameters for each request, making large models more efficient.

Häufig gestellte Fragen

How does MoE work?
The model contains multiple specialized 'experts'. A router decides which experts to activate for each request. Result: a 1T parameter model only uses 100B per request.
Which models use MoE?
GPT-4 (rumored), Mistral's Mixtral (confirmed), Google's Gemini, and DeepSeek V3. MoE has become the dominant architecture for very large models.