May 9, 2026

Build vs buy your AI voice agent? It is the wrong question.

Tech leaders ask me a recurring question: should we buy an off-the-shelf IVR or contact-centre platform, or build a Gen-AI voice agent ourselves?

In my experience, that framing leads to the wrong conversation. The decision that matters is which layers you buy and where you draw the build line.

For context: I shipped a voice agent for two Pilates studios in Portugal — Vonage for telephony, Vapi for the voice layer, OpenAI for reasoning, AWS Lambda for the orchestration webhook, PostgreSQL for the booking system, Pulumi for infrastructure. The studio owners never had to hire a receptionist.

The layered framework I would recommend:

Always buy
— Telephony and SIP trunking (Vonage, Twilio, Bandwidth). Carrier relationships and regulatory work take years.
— Voice infrastructure — TTS, STT, turn-taking (Vapi, Deepgram, Speechmatics, ElevenLabs, Azure Speech). Latency budgets and model drift are full-time problems for these vendors.
— The foundation model itself (OpenAI, Anthropic, Google, Mistral). Beyond debate at this point.

Consider buying a full platform
A turnkey CCaaS or conversational AI platform — Talkdesk, Genesys Cloud, AWS Connect, Five9, NICE CXone, Cognigy, Voiceflow — earns its place when:
— Call volumes sit in the 10k+ per day band, with queue management, agent failover and supervisor monitoring needed out of the box.
— Business logic can be expressed in a visual flow builder without losing fidelity.
— Compliance posture (PCI DSS, HIPAA, SOC 2) is something you would rather inherit from a certified vendor than build internally.
— Engineering capacity is the constraint, and owning the agent loop is not realistic in the next twelve months.

Always build
— The agent's reasoning, prompts and tool definitions. Your business differentiation lives here.
— Integrations to your domain systems — booking, CRM, ERP, billing. No vendor knows your data model.
— The evaluation suite that runs on every deploy. A small set of canonical scenarios with expected tool calls and expected final state is what separates "we ship Gen-AI" from "we ship Gen-AI with confidence." Vendor platforms will not test what you actually care about.

For the two studios the math was clear. Nowhere near 10k calls per day. No supervisor team to staff. Business logic too studio-specific for any flow builder. Build was the right call — but I bought roughly 70 percent of the stack from specialised vendors.

The right question for most tech leaders is not "build or buy." It is "which 70 percent are we buying, and is the remaining 30 percent the part of the system that should differentiate us?"

How did you draw the line on your last AI build decision?

P.S. New tech post every Wednesday.

#GenAI #AppliedAI #TechLeadership