Agentic AI

What is Agentic AI?

Standard AI answers questions. Agentic AI completes projects. This is the guide to what that difference actually means — and why it matters more than any AI development since ChatGPT.

Concept Guide AI Atlas

The one-sentence difference

Every AI tool you have used so far — ChatGPT, Claude, Gemini, Copilot — works the same basic way: you type something, it responds, the exchange is over. You are in control of every step. You decide what to ask next. The AI waits.

Agentic AI works differently. You describe a goal. The AI figures out the steps required to reach it, takes those steps itself — including using tools, searching the internet, writing and running code, and checking its own results — and keeps going until the goal is achieved. You are not driving. The AI is.

The word "agentic" comes from "agency" — the capacity to act independently in pursuit of a goal. An AI agent has agency. A standard AI chatbot does not.

The clearest way to think about it: A standard AI is like a very knowledgeable person you can ask questions. An AI agent is like a very capable person you can give a project to.

A concrete example

Say you ask an AI to research the top five competitors to your business, summarise their pricing, and produce a comparison table.

Standard AI (ChatGPT, Claude, Gemini): Gives you an answer based on what it was trained on — which may be months or years out of date. It cannot visit websites. It cannot check current pricing. What it produces is a best guess from old data.

Agentic AI: Opens each competitor's website. Reads the pricing pages. Checks when they were last updated. Cross-references against any public announcements. Writes the comparison table. Then tells you: "Three of these pages had last-updated dates from Q4 2024 — you may want to verify directly."

Same request. Completely different process and result.

Why this is new

The ability to use tools is the key development. For most of AI's history, language models could only do one thing: generate text. They could not visit websites. They could not run code. They could not send emails or read files or query databases. They were minds with no hands.

The combination of a reasoning model with the ability to use tools — and the ability to decide which tools to use, in what order, and how to interpret the results — is what creates an agent. The reasoning was always there. The tools are what changed everything.

What agentic AI is not

It is not magic, and it is not sentient. An agent does not understand goals the way a person does. It follows a structured loop: observe the current state, decide on an action, take the action, observe the new state, repeat. That loop can produce remarkably capable behaviour. But it is still a loop running on a statistical model, not genuine understanding.

It is also not always better than standard AI. For a question with a clear answer — "what is the capital of France?" — the overhead of an agent is wasteful. Agentic AI earns its complexity when the task requires multiple steps, external information, or actions that change the world in some way.

Where you encounter it today

You may already be using agentic AI without knowing the term. These are all agents in practice:

  • ChatGPT with browsing and code execution on — it decides when to search, when to write code, when to interpret results
  • Claude with tool use enabled — reads files, uses connected apps, chains steps together
  • Perplexity — automatically searches multiple sources and synthesises results for every query
  • Zapier AI and Make AI — trigger actions across apps based on conditions, without manual input
  • GitHub Copilot Workspace — takes a bug report or feature request and proposes, writes, and tests code changes across a whole repository

The short version: If an AI can take actions in the world — not just generate text — it is operating as an agent. The more it can decide which actions to take, the more agentic it is.

The four components of every AI agent

Every AI agent — regardless of which framework built it or which model powers it — has four components. These are not optional. Remove any one of them and the system is no longer an agent in the full sense.

👁

Perception

What the agent can observe. Text, files, web pages, API responses, images, database results. The agent's view of the world is limited to what it can perceive.

🧠

Memory

What the agent can remember. Within a session (context window), and optionally across sessions (external storage). Memory determines whether the agent can learn from its own actions.

🗺

Planning

How the agent decides what to do next. It breaks a goal into steps, selects which tool to use, evaluates the result, and decides whether to continue, backtrack, or stop.

Action

What the agent can do. Search the web, run code, read and write files, call APIs, send messages, create calendar events. The range of available tools defines the range of possible actions.

The agent loop — step by step

An agent does not think and then act once. It runs a continuous loop until the task is complete or it reaches a stopping condition. The loop works like this:

  1. Receive goal — The user or system provides a goal. This could be a natural language instruction ("research our top five competitors") or a structured task object from another agent.
  2. Plan — The agent uses its reasoning model to break the goal into steps. It identifies which tools it will need. If it is a capable model, it may also anticipate likely obstacles and plan for them.
  3. Act — It executes the first action — a web search, a file read, a code execution, an API call. It observes the result.
  4. Observe — The result is fed back into the context. The agent now has new information. Its next decision is made with that information present.
  5. Evaluate — Is the goal achieved? Is more action needed? Should the approach change? The agent assesses its current state against the original goal.
  6. Continue or stop — If more work remains, the loop repeats from step 3. If the goal is met, the agent produces a final output and halts.

This loop is sometimes called the ReAct pattern (Reasoning + Acting), a term from a 2022 research paper by Yao et al. at Princeton and Google Brain that formalised how reasoning and tool use could be interleaved.

How agents differ from standard AI — the full comparison

DimensionStandard AI (LLM)AI Agent
InputA promptA goal
OutputA responseA completed task
ControlHuman decides every next stepAgent decides next steps autonomously
ToolsNone (or manual)Search, code execution, APIs, files, more
MemoryWithin one conversationWithin session + optionally persistent
Self-correctionOnly if promptedEvaluates its own output and retries
Best forQuestions, drafts, analysis, ideasMulti-step tasks, research, workflows, automation
RiskHallucination in responsesHallucination + wrong actions with real consequences

Types of agents in practice

Not all agents are the same. The AI research community has converged on a few broad categories based on how the agent plans and acts:

Single-agent systems

One model with access to tools. The model receives a goal, plans steps, uses tools, and produces an output. Most of the agentic AI tools available to consumers today are single-agent systems — ChatGPT with tools enabled, Claude with tool use, Perplexity's search synthesis. Simple, direct, effective for most tasks.

Multi-agent systems

Multiple agents working together, each with a specific role. An orchestrator agent receives the goal and breaks it into sub-tasks. Specialist agents — a researcher, a writer, a fact-checker, a coder — each handle one sub-task. The orchestrator assembles the results. This mirrors how a team of humans operates. Multi-agent systems are covered in full in their own guide.

Tool-augmented agents

The most common form. The core is a language model; the augmentation is a set of tools it can call. The model decides when to use which tool. Tools are defined as functions with descriptions the model can read — it decides to call "search_web" or "run_python_code" or "read_file" based on what the task requires.

RAG-augmented agents

RAG stands for Retrieval-Augmented Generation. The agent can query a knowledge base — a company's internal documents, a database of research papers, a product catalogue — before deciding how to respond or act. This gives the agent access to specific, current, or proprietary information that was not in its training data. LlamaIndex specialises in this pattern.

The tools an agent can use

Tools are defined as functions with a name, description, and parameter schema. The agent's model reads the description and decides whether to call the function. Common tools in production agent systems include:

  • Web search — query a search engine and receive results
  • Web browsing — retrieve and parse the full content of a URL
  • Code execution — write and run Python, JavaScript, or other code; receive the output
  • File operations — read, write, edit, and create files
  • Database queries — query structured data sources via SQL or API
  • API calls — interact with external services (calendar, email, CRM, payments, maps)
  • Memory operations — store and retrieve information across sessions
  • Sub-agent calls — in multi-agent systems, call a specialist agent as if it were a tool

The emerging standard for how tools are defined and connected to agents is MCP (Model Context Protocol), developed by Anthropic and adopted across the major frameworks. MCP defines a universal interface — a "USB port" for AI agents — so any tool built to the MCP standard works with any agent built to the MCP standard. Official documentation is at modelcontextprotocol.io.

Technical foundations

Agentic AI builds on large language models (LLMs) but extends them in specific, documented ways. Understanding what is actually happening at the implementation level requires understanding three things: how tool use works at the model API level, how the agent loop is formalised, and what the current research says about the limits of the approach.

Tool use at the API level

Tool use is not a separate model — it is a feature of how modern LLM APIs accept and return structured data. The Anthropic Claude API, OpenAI API, and Google Gemini API all implement tool use in broadly the same way, following a pattern that has become a de facto standard.

The developer defines tools as JSON schema objects. Each tool has a name, a description written in natural language, and an input schema specifying what parameters the tool accepts and their types. This schema is passed to the model alongside the user's prompt as part of the API request.

The model's response can then include, instead of or alongside text, a tool_use content block specifying which tool to call and what parameters to pass. The calling application executes the tool — the model itself never directly executes code or accesses the internet — and returns the result as a tool_result content block. The model then continues its response with that result in context.

This is the fundamental loop. The model reasons, selects a tool, the application runs the tool, the result comes back, the model reasons again. The model is a reasoning engine; the application is an execution engine. The two are distinct.

The ReAct framework

The formal academic basis for most production agent implementations is the ReAct framework, introduced in the paper "ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022, arXiv:2210.03629). ReAct demonstrated that interleaving chain-of-thought reasoning with action steps — rather than reasoning first and then acting — produced substantially better performance on knowledge-intensive and decision-making tasks.

The ReAct loop is: Thought → Action → Observation → Thought → Action → Observation … where each Thought is the model's explicit reasoning about what to do next, each Action is a tool call, and each Observation is the result. Making the reasoning visible (rather than implicit) both improves accuracy and allows humans or other systems to inspect and validate the agent's decision-making.

LangChain, CrewAI, LlamaIndex, and AutoGen all implement variants of this pattern, with additional abstractions for memory management, multi-agent coordination, and error handling layered on top.

Context window as the limiting constraint

Every agent's capability is bounded by its context window — the amount of text the model can hold in working memory at once. As an agent completes steps, each tool result is appended to the context. Long-running tasks accumulate context. When the context limit is reached, the agent must either summarise or discard earlier information.

Context management is one of the most actively researched areas in agentic AI. Current approaches include:

  • Summarisation — the agent periodically compresses earlier parts of the context into a shorter summary
  • External memory — relevant results are written to a vector database and retrieved only when needed, rather than kept in the context continuously
  • Scratchpads — intermediate results are written to files and read back selectively
  • Hierarchical agents — sub-agents handle portions of the task, returning only summaries to the orchestrator rather than full execution traces

Safety: the OWASP Top 10 for Agentic AI

The Open Web Application Security Project (OWASP) published the first formal taxonomy of agentic AI security risks in 2026. Understanding these is not optional for anyone deploying agent systems. The ten risk categories, in brief:

  1. Prompt injection — malicious content in the environment (a web page, a document) instructs the agent to change its behaviour
  2. Insecure tool permissions — agent granted broader access than the task requires; exploited to take unintended actions
  3. Uncontrolled resource consumption — agent enters a loop or takes unnecessarily expensive actions; API costs escalate without limit
  4. Excessive autonomy — agent takes consequential, irreversible actions without requiring human confirmation
  5. Cascading failures in multi-agent systems — one agent's error propagates through the system before it can be caught
  6. Memory poisoning — malicious content stored in the agent's persistent memory influences future behaviour
  7. Credential and data exposure — tools require credentials; improperly stored credentials can be leaked through the agent's outputs
  8. Goal misalignment — agent optimises for a proxy metric rather than the actual intent; achieves the letter of the goal but not the spirit
  9. Inadequate logging — agent actions are not logged; post-incident investigation is impossible
  10. Supply chain risks — third-party tools or models integrated into an agent pipeline introduce vulnerabilities

Full documentation at owasp.org.

Regulatory context — EU AI Act

The EU AI Act (Regulation 2024/1689), which came into force in August 2024, classifies certain agentic AI deployments as high-risk systems subject to mandatory conformity assessments, documentation requirements, and human oversight obligations. The high-risk classification applies when an agent system is deployed in domains including education, employment, critical infrastructure, law enforcement, and essential private services. Developers and deployers of agent systems in the EU should consult the Act directly at eur-lex.europa.eu. High-risk provisions take full effect from August 2026.

Key research papers

  • ReAct (2022) — Yao et al., arXiv:2210.03629 — foundational paper formalising the reasoning-action interleaving pattern
  • Toolformer (2023) — Schick et al., Meta AI — demonstrated that language models can learn to use tools in a self-supervised manner
  • AutoGPT (2023) — Significant AI — first widely adopted open-source autonomous agent, demonstrated the pattern at scale
  • Chain-of-Thought Prompting (2022) — Wei et al., Google Brain — established the value of explicit reasoning steps, prerequisite for reliable agent planning
  • LLM as Agent Survey (2023) — Wang et al., arXiv:2308.11432 — comprehensive survey of agentic AI architectures and research directions

Source note: Technical specifications in this guide are drawn from official API documentation (Anthropic, OpenAI, Google), the cited research papers, and the OWASP Agentic AI Security Project. All links above are to primary sources.