Plain meaning
Start with the shortest useful explanation before going deeper.
The practice of ensuring AI systems behave according to human intentions and values—being helpful, harmless, and honest. Alignment encompasses training-time techniques (RLHF, Constitutional AI, DPO), inference-time guardrails, and evaluation through red teaming. As models become more capable, alignment becomes critical to prevent harmful content generation or manipulation by bad actors.