Γthique Updated 2026-04
Data Poisoning
Definition
Data poisoning is an attack that injects malicious data into an AI model's training set to corrupt its behavior or predictions.
See also in the glossary
A
AI Safety
AI Safety is the field focused on ensuring AI systems are safe, reliable and don't cause unintended harm.
A
AI Alignment
AI alignment aims to ensure an artificial intelligence system acts in accordance with human values and intentions.
F
Fine-tuning
Fine-tuning is the process of retraining an existing AI model on a specific dataset to adapt it to a particular domain or task.
M
Machine Learning
Machine Learning is a branch of AI where systems learn from data to improve their performance without being explicitly programmed for each task.
D
Deep Learning
Deep Learning is a subset of Machine Learning using multi-layered neural networks to learn complex representations from raw data.
O
Overfitting
Overfitting occurs when an AI model has over-learned the training data and fails to generalize to new data.
Tools that use data poisoning
Frequently Asked Questions
How can data poisoning be detected?
Detection involves statistical analysis of training data (anomaly detection), testing the model on clean datasets, and using techniques like outlier filtering and robust cross-validation.
Which models are most vulnerable to data poisoning?
Models trained on web-scraped data (like LLMs) are most exposed since anyone can publish content online. Fine-tuned models on small datasets are also vulnerable as a few poisoned examples can suffice.