AI comes with a vocabulary that can feel deliberately impenetrable. This glossary explains every significant AI term in plain English — from the basics that everyone encounters to the technical concepts that power the systems we use. Three levels: plain English, working definition, and precise technical meaning.
Computer systems that can perform tasks that normally require human intelligence — things like understanding language, recognising images, making decisions, and generating text or art. The field covers everything from the simple AI that recommends your next Netflix show to the large language models powering ChatGPT.
The type of AI that powers ChatGPT, Claude, Gemini, and most of the AI assistants you interact with. Trained on enormous amounts of text from the internet, books, and other sources — learning to predict and generate human language. “Large” refers to the number of parameters (weights) in the model, often in the billions or trillions. The training process teaches the model patterns in language so deep that it develops apparent understanding of concepts, not just words.
The input you give to an AI — your question, instruction, or request. Writing effective prompts is a genuine skill: the same underlying AI model can produce dramatically different outputs depending on how it is prompted. A well-crafted prompt gives the AI enough context, constraints, and direction to produce useful output. A vague prompt tends to produce a generic response.
When an AI produces confident-sounding information that is factually incorrect or entirely fabricated. The name captures the phenomenon: the AI presents its invention as reality, with the same tone and apparent confidence as something true. Hallucinations are a known limitation of all current language models — particularly for specific facts, statistics, citations, and recent events. The most important thing to understand about AI for research: always verify specific facts independently.
Software that converses with users through text or voice. Early chatbots (like those on customer service websites) operated on rigid scripts — they could only respond to specific inputs within a fixed flow. Modern AI chatbots (ChatGPT, Claude, Gemini) use language models and can hold genuinely open-ended conversations, answer questions about anything, and generate new content. The term is sometimes used loosely to refer to any AI assistant.
AI that creates new content — text, images, audio, video, or code — rather than just analysing or classifying existing content. ChatGPT generating an essay is generative AI. Midjourney generating an image is generative AI. A spam filter classifying emails is not — that is discriminative AI. The current wave of AI tools that has captured public attention (since late 2022) is almost entirely composed of generative AI systems.
How much text an AI can “see” at once — the limit of what it can take into account when generating a response. Early models had small context windows (a few thousand words). Current frontier models have enormous ones — Gemini 1.5 Pro can process over a million tokens, roughly equivalent to several full-length novels simultaneously. A larger context window means you can give the AI more information, upload longer documents, or have longer conversations before the model loses track of earlier content.
The unit that language models process text in. A token is roughly a word or part of a word — “unbelievable” might be 3-4 tokens; “AI” might be 1. Models have limits on tokens (the context window) and API pricing is typically per token. 1,000 tokens is approximately 750 words of English text. This matters in practice when you have very long documents or conversations — the model cannot process more text than its token limit allows at once.
The process by which an AI model learns from data. For a language model, training involves exposing the model to vast quantities of text and adjusting the model’s internal parameters (billions of numerical weights) to better predict patterns in that text. Training large models requires enormous computing resources — weeks or months on thousands of specialised chips, costing tens or hundreds of millions of dollars. This is why only a handful of companies (OpenAI, Anthropic, Google, Meta) train frontier models.
The point in time after which a model has no knowledge of world events — because its training data was collected before that date. If you ask a model with a 2024 knowledge cutoff about something that happened in 2025, it will not know. Models with web search (like Perplexity, or ChatGPT with search enabled) can access current information beyond their cutoff. Pure language models without search cannot.
An AI model that can understand and generate more than one type of data — text, images, audio, or video. GPT-4o is multimodal: it can accept an image as input and respond in text. Gemini is multimodal. Early language models were text-only (unimodal). Multimodal capability is increasingly standard in frontier models and is why you can now upload a photo and ask an AI what is in it.
Taking a pre-trained model and continuing to train it on a more specific dataset to specialise its behaviour. If you fine-tune GPT-4 on thousands of legal documents, you get a model that is better at legal language and tasks. Fine-tuning is how companies build specialised AI tools on top of general-purpose foundation models — a medical AI might be a fine-tuned version of a general model, trained further on medical literature and case notes.
A technique that improves AI accuracy by letting the model search a database of documents before generating a response. Instead of relying entirely on knowledge encoded in its weights (which may be outdated or missing specific information), a RAG system retrieves relevant documents and feeds them into the context window as part of the prompt. Notion AI Q&A, Microsoft 365 Copilot, and Perplexity all use RAG. It dramatically reduces hallucination for factual queries by grounding responses in specific source documents.
A hypothetical AI system that can perform any intellectual task that a human can — learning new skills on its own, reasoning flexibly across domains, applying knowledge in novel situations. Current AI systems are narrow specialists by comparison — extremely good at tasks related to language and perception, but without the flexible, general-purpose intelligence the term describes. Whether and when AGI might be achieved is one of the most debated questions in the field. Some researchers believe it is decades away; others believe current trajectories suggest it may arrive sooner.
A large AI model trained on broad data at scale, designed to be adapted to a wide range of downstream tasks. GPT-4, Claude 3, and Gemini 1.5 are foundation models. The concept was formalised in a 2021 Stanford paper that noted these models’ ability to serve as a general-purpose foundation for many specialised applications — through fine-tuning, prompting, or RAG. The term is sometimes used interchangeably with “base model.”
A training technique that uses human preferences to make AI models more helpful, harmless, and honest. Human raters compare pairs of AI responses and indicate which is better. This preference data trains a “reward model” that scores outputs. The main model is then trained with reinforcement learning to produce outputs that score highly. RLHF is the primary reason that modern AI assistants are so much more useful and less dangerous than raw language models trained only on internet text. OpenAI’s InstructGPT paper (2022) introduced the approach that led to ChatGPT.
Anthropic’s approach to training Claude to be helpful, harmless, and honest. Rather than relying purely on human raters, Constitutional AI uses a set of principles (a “constitution”) to guide the model’s self-critique and revision of its own outputs. The model is trained to evaluate its responses against the constitution and improve them — reducing dependence on human labelling at scale. Described in Anthropic’s paper: arxiv.org/abs/2212.08073.
Instructions given to an AI model before the user’s conversation begins — typically by the developer or business deploying the model, not visible to end users. System prompts define the model’s persona, capabilities, restrictions, and context for a particular deployment. When you use a customer service chatbot powered by Claude, Anthropic’s system prompt has been supplemented by the company’s own instructions telling the model how to behave for their specific use case.
A parameter that controls how random or deterministic an AI’s outputs are. At temperature 0, the model always picks the highest-probability next token — producing consistent, predictable outputs. At higher temperatures (0.7, 1.0), the model samples more randomly, producing more varied and sometimes more creative outputs. For factual tasks (coding, data extraction), low temperature is better. For creative writing, higher temperature often produces more interesting results.
The skill and practice of crafting prompts that get the best possible output from AI models. Techniques include: adding context and role definitions, using few-shot examples (showing the model examples of desired output), chain-of-thought prompting (asking the model to think through a problem step by step), and output formatting instructions. As models improve, prompt engineering becomes less critical for simple tasks — but remains important for complex or precise requirements.
Zero-shot means giving the model a task with no examples — just instructions. Few-shot means including a small number of examples (“few shots”) of the desired input-output format before your actual request. Few-shot prompting often significantly improves output quality for specific structured tasks, because the examples clarify exactly what format or type of response is expected.
A language model specifically trained or prompted to work through problems step by step before producing a final answer — sometimes described as “thinking before speaking.” OpenAI’s o1 and o3 series are reasoning models. They produce intermediate reasoning steps (which may or may not be shown to users) and perform significantly better than standard models on complex maths, coding, and logical reasoning tasks. The tradeoff: they are slower and more expensive per query than standard models.
An AI system that can take sequences of actions autonomously to complete a goal — not just generating a single response, but using tools, searching the web, writing and running code, and performing multi-step tasks. Claude Code is an agentic AI for software development. The shift from “AI that answers questions” to “AI that completes tasks” is one of the most significant developments underway. Agentic AI introduces new challenges around safety and oversight that are active areas of research.
Open source AI means the model weights (and often training code) are publicly available — anyone can download and run the model. Llama (Meta), Mistral, and Falcon are open-source models. Closed source means the weights are proprietary — you can only access the model through an API or product. GPT-4 (OpenAI), Claude (Anthropic), and Gemini (Google) are closed source. The debate between open and closed AI is active and significant: open models offer transparency and accessibility; closed models allow tighter safety controls and ongoing improvement without publishing weights that could be misused.
An open protocol created by Anthropic that standardises how AI models connect to external tools, data sources, and services. Think of it as a universal connector: instead of each AI tool requiring custom integration with each data source, MCP provides a standard interface. This allows Claude (and other AI systems adopting the protocol) to connect to Google Drive, Slack, databases, APIs, and other tools through a consistent method. Announced in November 2024, MCP has been widely adopted in the developer community.
The field of research and practice focused on ensuring AI systems behave as intended, do not cause unintended harm, and remain under meaningful human control as they become more capable. AI safety encompasses: alignment research (ensuring AI goals match human intentions), interpretability research (understanding what is happening inside AI models), robustness (ensuring AI performs safely even in unusual situations), and policy work. Anthropic, DeepMind, and OpenAI all have substantial safety research teams.
The challenge of ensuring AI systems pursue goals that are beneficial to humans — that they do what we intend, not just what we literally specify. The alignment problem becomes more significant as AI systems become more capable: a highly capable AI optimising for a slightly misspecified goal could cause serious harm. Current alignment approaches include RLHF, Constitutional AI, and interpretability research. Alignment is considered by many AI researchers to be the most important unsolved problem in AI development.
The architecture behind most AI image and video generators — including Stable Diffusion, Midjourney, and DALL-E. Diffusion models learn to generate images by learning to reverse a noise-adding process: during training, real images are progressively corrupted with noise; the model learns to denoise. During generation, starting from random noise, the model iteratively removes noise, guided by the text prompt, until a coherent image emerges. The technical elegance of this approach — and its ability to generate extraordinarily high-quality images — made it the dominant architecture for image generation from 2022 onwards.
The neural network architecture that underlies virtually all modern language models. Introduced in “Attention Is All You Need” (Vaswani et al., 2017 — arxiv.org/abs/1706.03762). The key innovation: the attention mechanism, which allows the model to weigh the relevance of every part of the input when processing each element — capturing long-range dependencies that previous recurrent architectures struggled with. GPT, BERT, Claude, Gemini, and virtually every other frontier model is a transformer or transformer-variant.
The core innovation of the transformer architecture. Attention allows the model to dynamically weight the importance of different parts of the input when generating each output token. When processing the word “bank” in a sentence, attention allows the model to attend more strongly to “river” or “money” depending on context, disambiguating meaning. Self-attention means the model attends to different parts of the same sequence. Multi-head attention runs multiple attention operations in parallel, each learning to focus on different types of relationships.
The numerical values inside a neural network that are adjusted during training. A model with 70 billion parameters has 70 billion individual numbers, each of which encodes some aspect of the patterns the model has learned. Larger parameter counts generally (though not always) correlate with more capable models. GPT-4’s parameter count has not been officially disclosed; estimates range from 200 billion to over a trillion. Llama 3 comes in variants from 8B to 70B parameters. The relationship between parameter count and capability is complex — architecture, training data quality, and training techniques all matter.
A way of representing words, sentences, images, or other data as lists of numbers (vectors) in a high-dimensional space — where semantically similar items are geometrically close together. “King” and “queen” have similar embeddings; “cat” and “dog” are closer to each other than either is to “algorithm.” Embeddings are the foundation of modern AI: they allow the model to perform mathematical operations on meaning. Semantic search (search by meaning rather than keyword) works by finding documents whose embeddings are closest to the query’s embedding.
Empirical relationships showing how model performance improves predictably as compute, data, and parameter count increase. The landmark scaling laws paper (Hoffmann et al., 2022 — the “Chinchilla” paper — arxiv.org/abs/2203.15556) showed that prior models were undertrained relative to their size — that a smaller model trained on more data could outperform a larger model trained on less. Scaling laws have driven the strategy of building ever-larger models on ever-larger datasets, and remain central to how AI companies plan their research roadmaps.
The high-dimensional internal representation space of a neural network — where the model’s learned concepts exist as geometric relationships between vectors. When a diffusion model generates an image, it works in latent space (a compressed representation) rather than pixel space, which is computationally more efficient. Concepts that are related occupy nearby regions of latent space. Interpolating between two points in latent space produces outputs that blend characteristics of both endpoints — a technique used in creative AI applications.
A model trained by OpenAI that learns to connect images and text — understanding which captions describe which images by training on 400 million image-text pairs. CLIP representations are used in most AI image generation systems: the text prompt is encoded by a CLIP text encoder into an embedding, which guides the image generation process. CLIP’s ability to map language and images into a shared embedding space was foundational to making text-to-image generation practical. Original paper: arxiv.org/abs/2103.00020.
A model architecture that uses multiple specialised sub-networks (“experts”), with a routing mechanism that selects which experts to activate for each input. Rather than all parameters being active for every query, only a subset are used — making inference more efficient without sacrificing capability. GPT-4 is believed to use MoE architecture. Mixtral (Mistral AI) explicitly uses MoE. The advantage: a model can have a large total parameter count (suggesting broad knowledge) while only activating a fraction of those parameters per inference (reducing computational cost).
A technique for reducing the memory and compute requirements of a model by representing its weights in lower precision — from 32-bit or 16-bit floating point numbers to 8-bit, 4-bit, or even 2-bit integers. A model quantised to 4-bit requires roughly one-eighth the memory of its 32-bit version, at some cost to output quality. Quantisation is what allows large models to run on consumer hardware — a 70B parameter model at 4-bit quantisation can run on a high-end consumer GPU. Used extensively in the open-source model community (Llama, Mistral) for local deployment.
The research field focused on understanding what is happening inside neural networks — what concepts individual neurons or circuits represent, how information flows through the network, and why the model produces particular outputs. Current frontier models are largely black boxes: we can observe inputs and outputs but not the internal reasoning process. Interpretability research (Anthropic has published extensively in this area through Mechanistic Interpretability work) aims to open that black box — both for safety reasons and to improve model capabilities.
Training is the process of building a model — adjusting billions of parameters over weeks or months on massive datasets. Inference is using a trained model to generate outputs — what happens every time you send a message to ChatGPT. Training is far more computationally expensive than inference, but inference at the scale of millions of daily users also requires enormous infrastructure. The economics of AI products are heavily influenced by inference costs — which is why model efficiency (producing high-quality outputs at lower inference cost) is a major competitive dimension alongside raw capability.