Glossary OS

Plain meaning

Start with the shortest useful explanation before going deeper.

The process of running a trained model on new inputs to generate predictions or outputs. Inference is the 'using' phase (vs. training). Inference cost depends on model size, input/output token count, and hardware (GPUs/TPUs). API providers (Anthropic, OpenAI) charge per token for inference. On-device inference (llama.cpp, GGUF) runs locally without API calls.

Mental model

Use the quick analogy first so the term is easier to reason about when you meet it in code, docs, or prompts.

Think of it as a piece of the context or inference stack behind agentic and LLM-powered Solana products.

Technical context

Place the term inside its Solana layer so the definition is easier to reason about.

LLMs, RAG, embeddings, inference, and agent-facing primitives.

Why builders care

Turn the term from vocabulary into something operational for product and engineering work.

This term unlocks adjacent concepts quickly, so it works best when you treat it as a junction instead of an isolated definition.

AI handoff

Use this compact block when you want to give an agent or assistant grounded context without dumping the entire page.

Inference (inference)
Category: AI / ML
Definition: The process of running a trained model on new inputs to generate predictions or outputs. Inference is the 'using' phase (vs. training). Inference cost depends on model size, input/output token count, and hardware (GPUs/TPUs). API providers (Anthropic, OpenAI) charge per token for inference. On-device inference (llama.cpp, GGUF) runs locally without API calls.
Related: LLM (Large Language Model), Token (AI/NLP)

Glossary Copilot

Ask grounded Solana questions without leaving the glossary.

Use glossary context, relationships, mental models, and builder paths to get structured answers instead of generic chat output.

Open full Copilot workspace

Question

Explain this code

Optional: paste Anchor, Solana, or Rust code so the Copilot can map primitives back to glossary terms.

Ask a glossary-grounded question

The Copilot will answer using the current term, related concepts, mental models, and the surrounding glossary graph.

Concept graph

See the term as part of a network, not a dead-end definition.

These branches show which concepts this term touches directly and what sits one layer beyond them.

Branch

LLM (Large Language Model)

A neural network trained on vast text corpora to understand and generate human language. LLMs (GPT-4, Claude, Llama, Gemini) use transformer architectures with billions of parameters. They power chatbots, code generation, summarization, and reasoning tasks. In blockchain development, LLMs assist with smart contract writing, audit review, documentation, and code explanation.

Transformer Foundation Model

Open term

Branch

Token (AI/NLP)

The basic unit of text processed by language models—typically a word, subword, or character. Tokenizers (BPE, SentencePiece) split text into tokens for model input. 'Solana blockchain' might tokenize as ['Sol', 'ana', ' block', 'chain']. Token count determines context window usage and API billing. Not to be confused with blockchain tokens (cryptocurrency assets).

Context Window

Open term

Next concepts to explore

Keep the learning chain moving instead of stopping at one definition.

These are the next concepts worth opening if you want this term to make more sense inside a real Solana workflow.

AI / ML

LLM (Large Language Model)

A neural network trained on vast text corpora to understand and generate human language. LLMs (GPT-4, Claude, Llama, Gemini) use transformer architectures with billions of parameters. They power chatbots, code generation, summarization, and reasoning tasks. In blockchain development, LLMs assist with smart contract writing, audit review, documentation, and code explanation.

Open term

AI / ML

Token (AI/NLP)

The basic unit of text processed by language models—typically a word, subword, or character. Tokenizers (BPE, SentencePiece) split text into tokens for model input. 'Solana blockchain' might tokenize as ['Sol', 'ana', ' block', 'chain']. Token count determines context window usage and API billing. Not to be confused with blockchain tokens (cryptocurrency assets).

Open term

AI / ML

io.net

A decentralized GPU network built on Solana that aggregates underutilized GPU resources from data centers, crypto miners, and individual contributors into clusters for AI and machine learning workloads. io.net uses the IO token for payments and staking, and enables users to deploy GPU clusters on demand at costs significantly below centralized cloud providers. The network supports training, inference, and fine-tuning workflows.

Open term

AI / ML

Hallucination

When an AI model generates plausible-sounding but factually incorrect information. LLMs hallucinate because they predict likely token sequences, not verified facts. In blockchain development, hallucinations can be dangerous—incorrect API usage, nonexistent functions, or wrong program addresses. Mitigation: RAG for grounding, code verification, testing, and using models with lower hallucination rates.

Open term

Commonly confused with

Terms nearby in vocabulary, acronym, or conceptual neighborhood.

These entries are easy to mix up when you are reading quickly, prompting an LLM, or onboarding into a new layer of Solana.

AI / MLdecentralized-inference

Decentralized Inference

Running AI model inference across distributed networks of GPU providers rather than centralized cloud infrastructure, using blockchain for coordination, payment, and verification. Key verification approaches include ZKML (zero-knowledge proofs of correct inference) and trusted execution environments (TEEs). Projects include Bittensor, Render Network, and io.net on Solana.

AliasProof of InferenceAliasZKML

Open term

Related terms

Follow the concepts that give this term its actual context.

Glossary entries become useful when they are connected. These links are the shortest path to adjacent ideas.

AI / MLllm

LLM (Large Language Model)

A neural network trained on vast text corpora to understand and generate human language. LLMs (GPT-4, Claude, Llama, Gemini) use transformer architectures with billions of parameters. They power chatbots, code generation, summarization, and reasoning tasks. In blockchain development, LLMs assist with smart contract writing, audit review, documentation, and code explanation.

Open term

AI / MLtoken-ai

Token (AI/NLP)

The basic unit of text processed by language models—typically a word, subword, or character. Tokenizers (BPE, SentencePiece) split text into tokens for model input. 'Solana blockchain' might tokenize as ['Sol', 'ana', ' block', 'chain']. Token count determines context window usage and API billing. Not to be confused with blockchain tokens (cryptocurrency assets).

Open term

Stay in the same layer and keep building context.

These entries live beside the current term and help the page feel like part of a larger knowledge graph instead of a dead end.

AI / ML

LLM (Large Language Model)

A neural network trained on vast text corpora to understand and generate human language. LLMs (GPT-4, Claude, Llama, Gemini) use transformer architectures with billions of parameters. They power chatbots, code generation, summarization, and reasoning tasks. In blockchain development, LLMs assist with smart contract writing, audit review, documentation, and code explanation.

Open term

AI / ML

Transformer

The neural network architecture underlying modern LLMs, introduced in 'Attention Is All You Need' (2017). Transformers use self-attention mechanisms to process input sequences in parallel (unlike recurrent networks). Key components: multi-head attention, positional encoding, feedforward layers, and layer normalization. Variants include encoder-only (BERT), decoder-only (GPT), and encoder-decoder (T5).

Open term

AI / ML

Attention Mechanism

A neural network component that allows models to weigh the relevance of different parts of the input when producing output. Self-attention computes query-key-value dot products across all positions, enabling each token to 'attend' to every other token. Multi-head attention runs multiple attention functions in parallel. Attention is O(n²) in sequence length, driving context window research.

Open term

AI / ML

Foundation Model

A large AI model trained on broad data that can be adapted for many downstream tasks. Foundation models (GPT-4, Claude, Llama 3, Gemini) are pre-trained on internet-scale text/code and can be fine-tuned, prompted, or used via APIs for specific applications. The term emphasizes that one base model serves as the foundation for diverse use cases rather than training task-specific models.

Open term