Éthique Updated 2026-04

AI Safety

Definition

AI Safety is the field focused on ensuring AI systems are safe, reliable and don't cause unintended harm.

See also in the glossary

AI alignment aims to ensure an artificial intelligence system acts in accordance with human values and intentions.

AI Hallucination

An AI hallucination is a response generated by an AI model that appears plausible but is factually incorrect or fabricated.

RLHF (Reinforcement Learning from Human Feedback)

RLHF is a training technique that uses human feedback to align an LLM's behavior with user expectations.

Generative AI refers to artificial intelligence systems capable of creating original content: text, images, video, audio, code.

Tools that use ai safety

The AI that understands nuance, by Anthropic

The world's most used conversational AI assistant

Google's AI assistant with 1M token context

The open source Chinese model rivaling GPT-4

Frequently Asked Questions

Why is AI Safety important?

LLMs can generate harmful content, be manipulated through prompt injection, or make biased decisions. AI Safety seeks to prevent these risks.

Who works on AI Safety?

Anthropic (Claude's creator) was explicitly founded for AI Safety. OpenAI, Google DeepMind and Meta also have dedicated teams.