IA / ML

Constitutional AI

An alignment technique developed by Anthropic where an AI model is guided by a 'constitution'—a set of explicit principles defining allowed and disallowed behavior—rather than relying solely on human feedback. The model critiques and revises its own outputs against these principles. Constitutional Classifiers extend this by training input/output classifiers that detect policy violations at low compute cost.

IDconstitutional-aiAliasCAI

Lectura rápida

Empieza por la explicación más corta y útil antes de profundizar.

An alignment technique developed by Anthropic where an AI model is guided by a 'constitution'—a set of explicit principles defining allowed and disallowed behavior—rather than relying solely on human feedback. The model critiques and revises its own outputs against these principles. Constitutional Classifiers extend this by training input/output classifiers that detect policy violations at low compute cost.

Modelo mental

Usa primero la analogía corta para razonar mejor sobre el término cuando aparezca en código, docs o prompts.

Piensa en esto como una pieza de la pila de contexto o inferencia usada en productos con agentes o LLMs.

Contexto técnico

Ubica el término dentro de la capa de Solana en la que vive para razonar mejor sobre él.

LLMs, RAG, embeddings, inferencia y primitivas orientadas a agentes.

Por qué le importa a un builder

Convierte el término de vocabulario en algo operacional para producto e ingeniería.

Este término desbloquea conceptos adyacentes rápido, así que funciona mejor cuando lo tratas como un punto de conexión y no como una definición aislada.

Handoff para IA

Handoff para IA

Usa este bloque compacto cuando quieras dar contexto sólido a un agente o asistente sin volcar toda la página.

Constitutional AI (constitutional-ai)
Categoría: IA / ML
Definición: An alignment technique developed by Anthropic where an AI model is guided by a 'constitution'—a set of explicit principles defining allowed and disallowed behavior—rather than relying solely on human feedback. The model critiques and revises its own outputs against these principles. Constitutional Classifiers extend this by training input/output classifiers that detect policy violations at low compute cost.
Aliases: CAI
Relacionados: AI Alignment, RLHF (Reinforcement Learning from Human Feedback)
Glossary Copilot

Haz preguntas de Solana con contexto aterrizado sin salir del glosario.

Usa contexto del glosario, relaciones entre términos, modelos mentales y builder paths para recibir respuestas estructuradas en vez de output genérico.

Abrir workspace completa del Copilot
Explicar este código

Opcional: pega código Anchor, Solana o Rust para que el Copilot mapee primitivas de vuelta al glosario.

Haz una pregunta aterrizada en el glosario

Haz una pregunta aterrizada en el glosario

El Copilot responderá usando el término actual, conceptos relacionados, modelos mentales y el grafo alrededor del glosario.

Grafo conceptual

Ve el término como parte de una red, no como una definición aislada.

Estas ramas muestran qué conceptos toca este término directamente y qué existe una capa más allá de ellos.

Rama

AI Alignment

The practice of ensuring AI systems behave according to human intentions and values—being helpful, harmless, and honest. Alignment encompasses training-time techniques (RLHF, Constitutional AI, DPO), inference-time guardrails, and evaluation through red teaming. As models become more capable, alignment becomes critical to prevent harmful content generation or manipulation by bad actors.

Rama

RLHF (Reinforcement Learning from Human Feedback)

A training technique that aligns LLM outputs with human preferences. Process: (1) train a reward model from human comparisons of outputs, (2) use reinforcement learning (PPO) to optimize the LLM against the reward model. RLHF makes models more helpful, harmless, and honest. Used by Claude, ChatGPT, and other assistants. Alternatives include DPO (Direct Preference Optimization) and Constitutional AI.

Siguientes conceptos para explorar

Mantén la cadena de aprendizaje en movimiento en lugar de parar en una sola definición.

Estos son los siguientes conceptos que vale la pena abrir si quieres que este término tenga más sentido dentro de un workflow real de Solana.

IA / ML

AI Alignment

The practice of ensuring AI systems behave according to human intentions and values—being helpful, harmless, and honest. Alignment encompasses training-time techniques (RLHF, Constitutional AI, DPO), inference-time guardrails, and evaluation through red teaming. As models become more capable, alignment becomes critical to prevent harmful content generation or manipulation by bad actors.

IA / ML

RLHF (Reinforcement Learning from Human Feedback)

A training technique that aligns LLM outputs with human preferences. Process: (1) train a reward model from human comparisons of outputs, (2) use reinforcement learning (PPO) to optimize the LLM against the reward model. RLHF makes models more helpful, harmless, and honest. Used by Claude, ChatGPT, and other assistants. Alternatives include DPO (Direct Preference Optimization) and Constitutional AI.

IA / ML

Context Window

The maximum amount of text (measured in tokens) an LLM can process in a single interaction. Larger windows enable processing more code/documentation at once. Sizes vary: GPT-4 (128K tokens), Claude (200K tokens), Gemini (1M+ tokens). One token ≈ 4 characters in English. Context window limits affect how much codebase an AI can analyze in a single request.

IA / ML

Claude Code

Anthropic's terminal-based agentic coding tool launched in early 2025 alongside Claude 3.7 Sonnet. It accepts natural-language commands in the shell and autonomously performs multi-step coding tasks including file editing, git operations, test execution, and large-scale refactoring using a 200K token context window. Claude Code can be extended with hooks, MCP servers, and custom slash commands for project-specific workflows.

Términos relacionados

Sigue los conceptos que realmente le dan contexto a este término.

Las entradas del glosario se vuelven útiles cuando están conectadas. Estos enlaces son el camino más corto hacia ideas adyacentes.

IA / MLai-alignment

AI Alignment

The practice of ensuring AI systems behave according to human intentions and values—being helpful, harmless, and honest. Alignment encompasses training-time techniques (RLHF, Constitutional AI, DPO), inference-time guardrails, and evaluation through red teaming. As models become more capable, alignment becomes critical to prevent harmful content generation or manipulation by bad actors.

IA / MLrlhf

RLHF (Reinforcement Learning from Human Feedback)

A training technique that aligns LLM outputs with human preferences. Process: (1) train a reward model from human comparisons of outputs, (2) use reinforcement learning (PPO) to optimize the LLM against the reward model. RLHF makes models more helpful, harmless, and honest. Used by Claude, ChatGPT, and other assistants. Alternatives include DPO (Direct Preference Optimization) and Constitutional AI.

Más en la categoría

Quédate en la misma capa y sigue construyendo contexto.

Estas entradas viven junto al término actual y ayudan a que la página se sienta parte de un grafo de conocimiento más amplio en lugar de un callejón sin salida.

IA / ML

LLM (Modelo de Lenguaje Grande)

A neural network trained on vast text corpora to understand and generate human language. LLMs (GPT-4, Claude, Llama, Gemini) use transformer architectures with billions of parameters. They power chatbots, code generation, summarization, and reasoning tasks. In blockchain development, LLMs assist with smart contract writing, audit review, documentation, and code explanation.

IA / ML

Transformer

The neural network architecture underlying modern LLMs, introduced in 'Attention Is All You Need' (2017). Transformers use self-attention mechanisms to process input sequences in parallel (unlike recurrent networks). Key components: multi-head attention, positional encoding, feedforward layers, and layer normalization. Variants include encoder-only (BERT), decoder-only (GPT), and encoder-decoder (T5).

IA / ML

Attention Mechanism

A neural network component that allows models to weigh the relevance of different parts of the input when producing output. Self-attention computes query-key-value dot products across all positions, enabling each token to 'attend' to every other token. Multi-head attention runs multiple attention functions in parallel. Attention is O(n²) in sequence length, driving context window research.

IA / ML

Foundation Model

A large AI model trained on broad data that can be adapted for many downstream tasks. Foundation models (GPT-4, Claude, Llama 3, Gemini) are pre-trained on internet-scale text/code and can be fine-tuned, prompted, or used via APIs for specific applications. The term emphasizes that one base model serves as the foundation for diverse use cases rather than training task-specific models.