Γ‰thique Updated 2026-04

Data Poisoning

Definition

Data poisoning is an attack that injects malicious data into an AI model's training set to corrupt its behavior or predictions.

Frequently Asked Questions

How can data poisoning be detected?
Detection involves statistical analysis of training data (anomaly detection), testing the model on clean datasets, and using techniques like outlier filtering and robust cross-validation.
Which models are most vulnerable to data poisoning?
Models trained on web-scraped data (like LLMs) are most exposed since anyone can publish content online. Fine-tuned models on small datasets are also vulnerable as a few poisoned examples can suffice.