Technique Aktualisiert 2026-04

Synthetic Data

Definition

Synthetic data is data artificially generated by algorithms or AI models, designed to reproduce the statistical properties of real data without containing personal information.

Häufig gestellte Fragen

Can synthetic data replace real data?
Not entirely. Synthetic data is a powerful complement to real data: it fills gaps, increases diversity and respects privacy. But a model trained solely on synthetic data risks model collapse — grounding in reality is always needed.
How is synthetic data generated?
Several methods exist: LLMs like ChatGPT or Claude for structured text, GANs for images, diffusion models, physics simulators, and classic statistical techniques like SMOTE for tabular data.