Voice Agents

Vapi

Vapi is an API platform for building AI voice agents that make and receive real phone calls. Describe what the agent should do, connect it to a phone number, and it handles calls 24/7 — booking appointments, answering questions, qualifying leads, conducting surveys — in any language, with sub-400ms response latency that makes it feel like a real conversation.

Voice Agents

What Vapi does

Vapi builds AI voice agents that talk on the phone. A Vapi agent picks up incoming calls or makes outbound calls, converses naturally with the person on the other end, takes actions (look up information, book appointments, update records), and handles the call from start to finish — without a human involved.

The defining capability: it sounds like a real conversation. Vapi targets 400ms or below end-to-end latency — the gap between when the caller stops speaking and when the agent starts responding. Below 500ms, a conversation feels natural. Above 1000ms, it feels like a robot. This latency target is what separates purpose-built voice AI platforms from general AI tools attempting voice.

Real use cases live in 2026: Medical clinic appointment booking (the Chennai clinic case in many AI training videos — handling 1,000+ calls/day across 12 clinics), real estate lead qualification, restaurant reservations, customer support first-line triage, insurance claim intake, employee HR Q&A. The common thread: high-volume, repetitive phone conversations with a predictable script.

What Vapi handles

  • Inbound calls — answer calls on your business number 24/7, handle any volume
  • Outbound calls — call lists of numbers for appointment reminders, follow-ups, surveys
  • Multi-language — switch languages mid-call or route based on detected language
  • Tool calling — look up records, book appointments, update CRMs via API calls during the conversation
  • Escalation — detect keywords or sentiment that trigger transfer to a human agent, with context passed along
  • Call recording and transcripts — every call is recorded and transcribed automatically

What Vapi does not do

Vapi does not replace human agents for complex, high-stakes, or emotionally sensitive conversations — complaints, medical diagnoses, legal advice, crisis situations. These require human judgment, empathy, and accountability that current voice AI cannot provide. Vapi is most effective for the high-volume, structured conversations that do not require these qualities.

Building a voice agent with Vapi

A Vapi agent is configured with three core components: the system prompt (who the agent is, what it should do, how it should behave), tools (APIs the agent can call during the conversation to take actions), and telephony (a phone number it is connected to). Building a basic agent requires defining these three things — no coding needed for simple use cases, the Vapi dashboard provides a no-code configuration interface.

For production agents, additional configuration matters: voice selection (which TTS model, which voice), silence detection settings (how long to wait before speaking again), interruption handling (what happens when the caller talks over the agent), and fallback behaviour (what to say if the agent does not understand).

Write a system prompt for an appointment booking agent
Write a Vapi system prompt for an AI receptionist for [type of business — e.g. a dental clinic / a hair salon / a physiotherapy practice]. The agent should: - Greet callers professionally - Handle appointment booking, rescheduling, and cancellation - Answer basic questions about [services, hours, location] - Transfer to a human for complaints or complex queries - Speak in [language(s)] Include specific instructions for how to handle: callers who don't respond, repeated misunderstandings, and urgent situations.
Design a lead qualification agent
Write a Vapi system prompt for an outbound sales agent that calls leads from [type of company — e.g. a real estate agency / a solar panel installer / an insurance broker]. The agent should: - Introduce itself clearly and non-deceptively as an AI - Ask qualifying questions to determine if the lead is a good fit: [list 3-5 key questions] - If qualified, offer to connect to a human sales rep - If not qualified, politely end the call and log the reason - Maximum call duration: [X] minutes Write the exact opening line the agent should use.
Build an FAQ response agent
I want a Vapi agent that answers frequently asked questions about my [business type]. Here are the 10 most common questions and their answers: [paste Q&A pairs] Write a system prompt that: uses these exact answers, knows when a question is outside its scope and says so honestly, always offers to connect to a human if the caller is unsatisfied, and never makes up information it does not have.
Handle call escalation properly
Write the escalation logic for a Vapi voice agent. The agent should escalate to a human in the following situations: 1. The caller explicitly asks for a human 2. The caller expresses strong frustration or anger (detect words like: [list keywords]) 3. The query involves [sensitive topics — e.g. a complaint / a billing dispute / a medical emergency] 4. The agent has failed to understand the caller twice in a row When escalating: explain what is happening, keep the caller on hold with music, and pass a summary of the conversation to the human agent.
Set up outbound appointment reminders
I want to use Vapi to call patients/customers 24 hours before their appointment. Each call should: - Confirm the appointment details (name, date, time, location) - Offer to confirm, reschedule, or cancel - If rescheduling: collect their preferred new time and update the booking system via webhook - If cancelling: confirm the cancellation and update the system - Be under 2 minutes Write the system prompt and describe the webhook structure for updating the booking system.
Multi-language agent setup
I need a Vapi agent for [business type] that serves customers in [list languages — e.g. English, Hindi, and Tamil]. Design the language handling: (1) how to detect the caller's preferred language from their first sentence, (2) how to switch languages if the caller switches mid-call, (3) how to handle a language the agent does not support — what to say and what to do, (4) any cultural considerations for each language region that should affect the agent's tone or approach.
Measure voice agent performance
I have a Vapi voice agent handling [type of calls]. Define the KPIs I should track, explain how to measure each one in Vapi's analytics, and give me the benchmark values that indicate the agent is performing well vs needs improvement. KPIs should cover: call completion rate, task success rate, escalation rate, average call duration, caller satisfaction, and cost per call.
Compare Vapi vs Retell AI for my use case
I need a voice AI platform for [describe use case — type of calls, volume, languages, integrations needed, budget]. Compare Vapi and Retell AI specifically for this use case. For each: strengths, weaknesses, pricing for my expected volume, and which would be easier to set up and maintain.
Write a post-call summary webhook
After each Vapi call, I want to automatically: (1) save the transcript to [CRM/spreadsheet/database], (2) categorise the call by outcome [list categories], (3) flag calls that need human follow-up. Write a webhook handler specification — the data Vapi sends after each call and the logic to process it — that I can use to build this automation.

Vapi's technical architecture

Vapi is a real-time AI voice pipeline combining: a speech-to-text (STT) model that converts caller audio to text with minimal latency, a large language model that processes the transcript and generates a response, and a text-to-speech (TTS) model that converts the response to audio and streams it back to the caller. The entire pipeline runs in under 400ms end-to-end for the response generation phase — the key latency target for natural-feeling conversation.

Vapi is model-agnostic: users can choose STT models (Deepgram, Whisper, Assembly AI), LLMs (GPT-4, Claude, Gemini, or a custom fine-tuned model), and TTS voices (ElevenLabs, Cartesia, PlayHT, Deepgram TTS). This modularity allows cost optimisation — cheaper models for simple use cases, more expensive ones only where quality requires it.

Tool calling during a call

The most powerful capability is real-time tool calling. During a conversation, the LLM can invoke external APIs — look up a patient record, check calendar availability, place a booking, query a product database — and incorporate the result into its response before the caller notices the pause. This requires: defining the tool functions in Vapi's configuration, building the API endpoints the tools call, and testing the latency impact of each tool call (each adds ~100-500ms depending on the API being called).

Telephony integration

Vapi integrates with telephony providers including Twilio, Vonage, and others. You can provision phone numbers directly through Vapi or bring your own Twilio/Vonage numbers. SIP trunking is supported for enterprise integrations with existing phone systems. HIPAA-compliant configurations are available for healthcare use cases (Business Associate Agreement required — contact Vapi enterprise sales).

Comparison to building with raw APIs

The alternative to Vapi is building a voice pipeline from scratch: Twilio for telephony, Deepgram for STT, GPT-4 for LLM, ElevenLabs for TTS, with a WebSocket server managing the real-time audio streams. This is technically possible but requires significant engineering effort to handle the real-time audio streaming, latency optimisation, error handling, and concurrent call management that Vapi handles automatically. For most teams, Vapi's abstraction layer saves months of infrastructure work.

Source note: Pricing and technical specifications from vapi.ai and docs.vapi.ai. Architecture from Vapi product documentation. All verified April 2026.