Γthique Updated 2026-04
AI Alignment
Definition
AI alignment aims to ensure an artificial intelligence system acts in accordance with human values and intentions.
See also in the glossary
A
AI Safety
AI Safety is the field focused on ensuring AI systems are safe, reliable and don't cause unintended harm.
R
RLHF (Reinforcement Learning from Human Feedback)
RLHF is a training technique that uses human feedback to align an LLM's behavior with user expectations.
L
LLM (Large Language Model)
An LLM is an AI model trained on billions of texts, capable of understanding and generating human language.
A
AI Hallucination
An AI hallucination is a response generated by an AI model that appears plausible but is factually incorrect or fabricated.
Tools that use ai alignment
Frequently Asked Questions
Are Alignment and Safety the same?
Related but different. Safety prevents immediate harm. Alignment ensures AI pursues the right objectives long-term, even as it becomes very powerful.
Why is alignment hard?
Precisely specifying what we want is surprisingly difficult. An LLM optimized to 'be helpful' might lie if that's what the user wants to hear. Alignment seeks balance.