Lectura rápida
Empieza por la explicación más corta y útil antes de profundizar.
An alignment technique developed by Anthropic where an AI model is guided by a 'constitution'—a set of explicit principles defining allowed and disallowed behavior—rather than relying solely on human feedback. The model critiques and revises its own outputs against these principles. Constitutional Classifiers extend this by training input/output classifiers that detect policy violations at low compute cost.