AI Terminology Glossary: Essential Terms Explained

AI Unpacking

Disclosure

Important reader notice

This article is for general informational and educational purposes only. It is not legal, financial, tax, medical, security, compliance, or other professional advice, and you should not rely on it as a substitute for advice from a qualified professional who understands your specific situation.

AI tools, pricing, features, policies, laws, and platform terms can change quickly. We work to keep content accurate, but we do not guarantee that every detail is current, complete, or suitable for your use case. Always verify important claims with the original source before making business, legal, financial, safety, or purchasing decisions.

Some links may be affiliate, partner, or sponsored links. If you buy through them, AIUnpacking may earn compensation at no extra cost to you. Sponsored relationships are disclosed where applicable, and compensation does not override our editorial judgment.

If you have spent any time reading about AI in 2026, you have probably run into words like “embeddings,” “RAG,” or “agentic AI” and just nodded along hoping nobody asked what they meant. You are not alone. This glossary explains every key term in plain English, the way you would describe it to a coworker over coffee. No PhD required.

AGI (Artificial General Intelligence)

A hypothetical AI that matches or exceeds human-level performance across virtually any intellectual task. As of May 2026, no publicly available system qualifies as AGI, and even top researchers disagree on what it means or how close we are. Take any claim of “we built AGI” with heavy skepticism.

Agentic AI / AI Agent

An AI that takes multi-step actions on its own rather than just answering one question. While a regular chatbot responds to a prompt, an agent browses the web, calls APIs, runs code, and chains tasks together to achieve a goal. Agentic AI is one of the biggest enterprise trends of 2026.

AI Governance

The policies and oversight processes that ensure AI is deployed responsibly. For businesses in 2026, governance covers who can use AI tools, what data can be shared, how outputs get reviewed, and how AI-assisted decisions are documented. The EU AI Act has made this a board-level concern.

Alignment

The field of making AI systems behave according to human intentions and safety expectations. Even highly capable models can do things you never intended, so alignment uses techniques like RLHF, red teaming, and safety training to close that gap. It is a top priority at every major AI lab in 2026.

Attention

The mechanism that lets a model decide which parts of its input matter most. If you read “The cat sat on the mat because it was tired,” attention helps the model understand that “it” means “the cat,” not “the mat.” Introduced in the 2017 paper “Attention Is All You Need,” it is the backbone of every modern language model.

Benchmark

A standardized test used to compare AI models. Common ones in 2026 include MMLU (general knowledge), HumanEval (coding), SWE-Bench (software engineering), and MATH (math reasoning). Benchmarks are useful signals but do not always predict real-world performance, and models can be trained to game specific tests.

Chain of Thought (CoT)

When a model breaks a complex problem into smaller steps before answering, similar to showing your work on a math test. Reasoning models like GPT-5.5 Thinking and Claude Opus 4.7 are optimized for this. CoT takes longer and costs more, but answers are dramatically more accurate for logic, coding, and math.

Context Window

The maximum amount of text an AI can “see” and process at once, like its working memory. As of May 2026, Gemini 3.1 Pro supports 1 million tokens (with a 2 million option), Claude Opus 4.7 handles 1 million, and GPT-5.5 Pro offers about 400,000. One million tokens is roughly the length of the entire Harry Potter series.

Deep Learning

A type of machine learning using multi-layered neural networks to process data at increasing levels of abstraction. The “deep” refers to the many layers. Deep learning powers image recognition, speech synthesis, translation, and the language understanding inside modern LLMs.

Distillation

A technique where a smaller “student” model learns by mimicking a larger “teacher” model’s outputs, creating a faster, cheaper model that retains much of the larger model’s capability. Most commercial API terms of service now explicitly prohibit distilling from their models.

Embedding

A numerical representation of text or content that captures meaning. When “revenue growth” is converted into a vector of numbers, that vector sits mathematically close to “sales increase” because the concepts are similar. Embeddings power semantic search, recommendation engines, and RAG systems.

Fine-Tuning

Taking a pre-trained model and continuing its training on a specialized dataset. A law firm might fine-tune a general model on legal contracts to improve drafting accuracy. Unlike RAG, which connects a model to external documents at query time, fine-tuning permanently changes the model’s behavior.

Foundation Model

A large AI model trained on broad data that can adapt to many tasks. GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, and Llama 4 are all foundation models. The term distinguishes these general-purpose workhorses from smaller models trained for a single narrow task.

Generative AI

AI systems that create new content (text, images, audio, video, code) rather than just classifying or analyzing existing data. ChatGPT, Claude, Gemini, and Midjourney are all generative AI tools. The output is generated fresh each time from learned patterns, not retrieved from a database.

GPU (Graphics Processing Unit)

The hardware backbone of AI. Originally built for video game graphics, GPUs excel at the parallel math that neural networks require. Nvidia dominates with H100 and B200 chips, and the global race to secure GPU supply is one of the defining economic stories of 2026.

Hallucination

When an AI confidently produces information that is factually wrong or entirely made up. It happens because language models predict statistically likely word sequences rather than retrieving facts. The best defenses are using RAG to ground responses in real documents and treating AI output as a draft that needs review.

Inference

What happens when you send a prompt and get a response: using a trained model to generate output. Training happens once (expensive, takes weeks or months). Inference happens millions of times daily. When businesses talk about AI costs at scale, they mean inference costs.

LLM (Large Language Model)

An AI model trained on enormous amounts of text, capable of understanding, generating, summarizing, and reasoning about language. GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, and Llama 4 are all LLMs. The “large” refers to the scale of training data and internal parameters. When you chat with any AI assistant, an LLM is doing the thinking.

Machine Learning

A subfield of AI where systems learn patterns from data rather than following explicitly programmed rules. Instead of “if X then Y,” you feed the system thousands of examples and it figures out the rules. All modern LLMs are built on machine learning foundations.

MCP (Model Context Protocol)

An open standard introduced by Anthropic that gives AI models a universal way to connect to external tools, data, and services. Think USB-C for AI: one standard protocol instead of custom code for every integration. MCP has seen rapid adoption in 2026.

Multimodal AI

Models that process and generate multiple types of data (text, images, audio, video) in one system. GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.7 are all multimodal: upload a chart, ask about a photo, or get the model to describe a video clip, all from one system.

Natural Language Processing (NLP)

The broader AI field focused on enabling computers to understand and generate human language. Tasks like sentiment analysis, translation, and text classification all fall under NLP. LLMs are the most advanced products of decades of NLP research.

Neural Network

A computational system loosely inspired by the human brain, made of interconnected layers of nodes (“neurons”) that process signals. Neural networks are the foundational architecture behind deep learning and all modern LLMs.

Parameters

The numerical values inside an AI model learned during training. When you hear “a 70-billion-parameter model,” that describes the scale of its internal configuration. More parameters generally mean more capacity, but architecture quality and training data matter just as much.

Prompt / Prompt Engineering

A prompt is any instruction you give an AI. Prompt engineering is designing prompts to get better results through techniques like role assignment (“act as a senior editor”), providing examples, and specifying output format. Well-crafted prompts still dramatically outperform vague ones.

RAG (Retrieval-Augmented Generation)

A technique that gives an AI access to external knowledge at query time. Without RAG, a model can only answer from its training data, which may be outdated. With RAG, the model retrieves relevant documents first, then generates a grounded response. This is the primary way businesses give AI access to proprietary data without expensive fine-tuning.

Reasoning Model

An LLM optimized to think step-by-step through complex problems before answering. These models outperform standard ones on math, coding, and decision-making, but are slower and more expensive. Use them for high-stakes analysis, not for asking about the weather.

Red Teaming

Stress-testing AI systems by deliberately trying to make them fail or produce harmful outputs. It is the AI equivalent of hiring ethical hackers to find vulnerabilities before bad actors do. Automated red teaming is standard practice at every major AI lab in 2026.

RLHF (Reinforcement Learning from Human Feedback)

A training method where human evaluators rate model outputs, and those ratings teach the model what “good” looks like. RLHF was instrumental in making ChatGPT usable. A simpler alternative called DPO (Direct Preference Optimization) is also widely used in 2026.

SLM (Small Language Model)

Compact language models designed to run on smartphones, laptops, and edge devices, typically with a few million to 7 billion parameters. In 2026, SLMs are surging because they cut cloud inference costs by up to 90% and can run entirely on-device for privacy.

System Prompt

A pre-set instruction given to a model before user interaction begins, defining how it should behave throughout a conversation. For example: “You are a helpful customer service agent. Only discuss company products.” System prompts are invisible to users but shape everything the model says.

Temperature

A parameter controlling how creative or predictable a model’s outputs are. Low temperature (near 0) produces consistent, factual output. High temperature (near 1 or above) adds randomness for creative results. Use low for data extraction, high for brainstorming.

Token

The basic unit of text a model processes, roughly three-quarters of an English word. Token counts determine how much a prompt costs (most providers charge per token), how much context window is consumed, and how long a response takes.

Transformer

The neural network architecture behind virtually every modern language model. Introduced in the 2017 paper “Attention Is All You Need,” transformers use self-attention to process all parts of an input simultaneously. GPT, Claude, Gemini, Llama: all transformers.

Vector Database

A specialized database that stores embeddings and searches by semantic similarity. Instead of “find rows where name equals John,” it finds “documents conceptually similar to this idea.” Vector databases are the backbone of RAG systems and AI-powered search.

Weights

The numerical parameters inside a neural network that determine how much influence each input has on the output. Training starts with random weights, and through millions of iterations the model adjusts them. The final set of weights is essentially the model’s stored knowledge.

Zero-Shot

When a model performs a task without any examples in the prompt, relying entirely on its training. “Few-shot” is the companion concept: including a couple of examples to guide the model. Zero-shot is simpler; few-shot tends to be more accurate for tricky tasks.

Frequently Asked Questions

What is the difference between AI and machine learning? AI is the broad field of building intelligent systems. Machine learning is a subset where systems learn from data rather than following hand-written rules. All machine learning is AI, but not all AI is machine learning.

What is the difference between an LLM and a chatbot? An LLM is the underlying model. A chatbot is the application built on top. ChatGPT is a chatbot; GPT-5.5 is the LLM powering it. The same LLM can power many different chatbots.

What does “hallucination” actually mean? Hallucination is when a model confidently states something factually incorrect. It happens because models predict likely word sequences, not retrieve verified facts. Always fact-check AI outputs before relying on them.

What is a context window? The maximum text an AI can remember at once. If a conversation exceeds the limit, earlier content is forgotten. Bigger windows let you work with longer documents without the model losing track.

What is the difference between fine-tuning and RAG? Fine-tuning permanently changes a model through additional training. RAG connects a model to external documents at query time without retraining. RAG is faster and cheaper; fine-tuning is better for consistent behavioral changes.

What is an AI agent versus a chatbot? A chatbot answers single questions. An AI agent takes multi-step actions: browsing the web, running code, calling APIs, and chaining tasks together to achieve a goal autonomously.

What is a reasoning model and when should I use one? A reasoning model thinks step-by-step before answering. Use it for complex logic, math, coding, or high-stakes decisions. For simple questions, a standard model is faster and cheaper.

What does “open source” mean for AI models? The model weights are publicly available to download and modify. Training data is rarely shared. “Open weights” is the more accurate term, but “open source” is what most people say.

What is MCP? The Model Context Protocol is a universal standard connecting AI models to tools and data. It eliminates custom integration code and is rapidly becoming the default for agentic AI in 2026.

Verified Sources

Vaswani et al., “Attention Is All You Need,” arXiv, 2017: https://arxiv.org/abs/1706.03762
Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” arXiv, 2020: https://arxiv.org/abs/2005.11401
Anthropic, “Introducing the Model Context Protocol,” November 2024: https://www.anthropic.com/news/model-context-protocol
OpenAI, “Learning to Reason with LLMs,” September 2024: https://openai.com/index/learning-to-reason-with-llms/
TechCrunch, “AI Glossary,” updated May 2026: https://techcrunch.com/2026/05/09/artificial-intelligence-definition-glossary-hallucinations-guide-to-common-ai-terms/
Google Machine Learning Glossary, updated April 2026: https://developers.google.com/machine-learning/glossary
MIT Sloan, “Agentic AI, Explained,” February 2026: https://mitsloan.mit.edu/ideas-made-to-matter/agentic-ai-explained
International AI Safety Report 2026: https://internationalaisafetyreport.org/