Éthique Updated 2026-04

Data Poisoning

Definition

Data poisoning is an attack that injects malicious data into an AI model's training set to corrupt its behavior or predictions.

See also in the glossary

AI Safety

AI Safety is the field focused on ensuring AI systems are safe, reliable and don't cause unintended harm.

AI Alignment

AI alignment aims to ensure an artificial intelligence system acts in accordance with human values and intentions.

Fine-tuning

Fine-tuning is the process of retraining an existing AI model on a specific dataset to adapt it to a particular domain or task.

Machine Learning

Machine Learning is a branch of AI where systems learn from data to improve their performance without being explicitly programmed for each task.

Deep Learning

Deep Learning is a subset of Machine Learning using multi-layered neural networks to learn complex representations from raw data.

Overfitting

Overfitting occurs when an AI model has over-learned the training data and fails to generalize to new data.

Tools that use data poisoning

Hugging Face

The reference open source platform for AI models

4.6/5

Cohere

The enterprise AI platform for NLP and RAG

4.4/5

Frequently Asked Questions

How can data poisoning be detected?

Detection involves statistical analysis of training data (anomaly detection), testing the model on clean datasets, and using techniques like outlier filtering and robust cross-validation.

Which models are most vulnerable to data poisoning?

Models trained on web-scraped data (like LLMs) are most exposed since anyone can publish content online. Fine-tuned models on small datasets are also vulnerable as a few poisoned examples can suffice.