Glossary OS

Plain meaning

Start with the shortest useful explanation before going deeper.

Artificially generated training data produced by LLMs or other AI models, used to augment or replace human-annotated datasets. Techniques include prompt-based generation, retrieval-augmented pipelines, and iterative self-refinement. Synthetic data slashes costs from $5-20 per human preference point to under $0.01 per sample and became central to post-training pipelines in 2024-2025.

Mental model

Use the quick analogy first so the term is easier to reason about when you meet it in code, docs, or prompts.

Think of it as a piece of the context or inference stack behind agentic and LLM-powered Solana products.

Technical context

Place the term inside its Solana layer so the definition is easier to reason about.

LLMs, RAG, embeddings, inference, and agent-facing primitives.

Why builders care

Turn the term from vocabulary into something operational for product and engineering work.

This term unlocks adjacent concepts quickly, so it works best when you treat it as a junction instead of an isolated definition.

AI handoff

Use this compact block when you want to give an agent or assistant grounded context without dumping the entire page.

Synthetic Data (AI Training) (synthetic-data)
Category: AI / ML
Definition: Artificially generated training data produced by LLMs or other AI models, used to augment or replace human-annotated datasets. Techniques include prompt-based generation, retrieval-augmented pipelines, and iterative self-refinement. Synthetic data slashes costs from $5-20 per human preference point to under $0.01 per sample and became central to post-training pipelines in 2024-2025.
Aliases: AI-Generated Training Data
Related: Knowledge Distillation, DPO (Direct Preference Optimization), Fine-Tuning

Glossary Copilot

Ask grounded Solana questions without leaving the glossary.

Use glossary context, relationships, mental models, and builder paths to get structured answers instead of generic chat output.

Open full Copilot workspace

Question

Explain this code

Optional: paste Anchor, Solana, or Rust code so the Copilot can map primitives back to glossary terms.

Ask a glossary-grounded question

The Copilot will answer using the current term, related concepts, mental models, and the surrounding glossary graph.

Concept graph

See the term as part of a network, not a dead-end definition.

These branches show which concepts this term touches directly and what sits one layer beyond them.

Branch

Knowledge Distillation

A technique for transferring capabilities from a large 'teacher' model to a smaller 'student' model, typically by having the teacher generate a synthetic dataset that the student is fine-tuned on. Distilled models can match or exceed teacher performance on specific tasks while being much cheaper to deploy. Common in 2024-2025 for creating efficient specialized models.

Synthetic Data (AI Training)

Plain meaning

Mental model

Technical context

Why builders care

AI handoff

Ask grounded Solana questions without leaving the glossary.

Ask a glossary-grounded question

See the term as part of a network, not a dead-end definition.

Knowledge Distillation

DPO (Direct Preference Optimization)

Fine-Tuning

Keep the learning chain moving instead of stopping at one definition.

Knowledge Distillation

DPO (Direct Preference Optimization)

Fine-Tuning

System Prompt

Terms nearby in vocabulary, acronym, or conceptual neighborhood.

Training (ML)

Follow the concepts that give this term its actual context.

Knowledge Distillation

DPO (Direct Preference Optimization)

Fine-Tuning

Stay in the same layer and keep building context.

LLM (Large Language Model)

Transformer

Attention Mechanism

Foundation Model