Technique Aktualisiert 2026-04

AI Inference

Definition

Inference is the process of using a trained AI model to generate predictions or responses from new data.

Siehe auch im Glossar

An LLM is an AI model trained on billions of texts, capable of understanding and generating human language.

A token is the basic unit processed by an LLM. It's a piece of word, punctuation or character that the model uses to understand and generate text.

AI API

An AI API allows developers to integrate artificial intelligence capabilities into their applications.

GPU Cloud

GPU Cloud provides on-demand graphics processors for training and running AI models without hardware investment.

Tools, die ai inference verwenden

ChatGPT

Der weltweit meistgenutzte KI-Konversationsassistent

4.6/5

Claude

Die KI, die Nuancen versteht – von Anthropic

4.7/5

RunPod

GPU-Cloud für das Deployment Ihrer KI-Anwendungen

4.6/5

DeepSeek

Das chinesische Open-Source-Modell auf GPT-4-Niveau

4.7/5

Häufig gestellte Fragen

What's the difference between training and inference?

Training creates the model (expensive, done once). Inference uses the model to respond (cheaper, per request). When you ask ChatGPT a question, that's inference.

Why does inference cost money?

Each request requires GPU computation. The longer the response and larger the model, the more expensive. That's why APIs charge per token.