Leitura rápida
Comece pela explicação mais curta e útil antes de aprofundar.
The practice of ensuring AI systems behave according to human intentions and values—being helpful, harmless, and honest. Alignment encompasses training-time techniques (RLHF, Constitutional AI, DPO), inference-time guardrails, and evaluation through red teaming. As models become more capable, alignment becomes critical to prevent harmful content generation or manipulation by bad actors.