Voice (IVR)

Tavio's voice channel combines Twilio for telephony with ElevenLabs for natural text-to-speech. Customers call your number and interact with your AI agent by voice.

Architecture

Inbound Call (Twilio) -> Speech-to-Text -> AI Pipeline -> TTS (ElevenLabs) -> Audio Playback (Twilio)

Twilio handles call routing, recording, and audio playback
ElevenLabs generates natural-sounding speech with voice customization
AI responses are capped at 150 tokens for spoken brevity

Webhook URLs

Configure these in your Twilio phone number settings:

Inbound voice webhook

text

https://bankmind.aihookd.site/api/v1/voice/inbound

Status callback

text

https://bankmind.aihookd.site/api/v1/voice/status

Required Credentials

Service	Credential	Description
Twilio	account_sid	Twilio Account SID
Twilio	auth_token	Twilio Auth Token
Twilio	phone_number	Your Twilio phone number
ElevenLabs	api_key	ElevenLabs API key for TTS

TwiML Response Format

Tavio responds to Twilio with TwiML that plays pre-generated audio:

TwiML response

xml

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Play>https://bankmind.aihookd.site/api/v1/voice/audio/response_abc123</Play>
  <Gather input="speech" timeout="5" action="/api/v1/voice/inbound">
    <!-- Listen for next customer input -->
  </Gather>
</Response>

Voice Customization

ElevenLabs TTS supports full voice customization. Configure these in Dashboard > Voice:

Parameter	Range	Description
pitch	-20 to +20 semitones	Adjust voice pitch up or down
speakingRate	0.5x to 2.0x	Speed of speech delivery
language	BCP-47 code	Language and accent (e.g., en-KE, sw-TZ)

Call Flow

Customer dials your Twilio phone number
Twilio sends an HTTP POST to your inbound webhook
Tavio verifies the Twilio request signature
A welcome greeting is played via TTS
Customer speaks; Twilio captures speech input
Speech is transcribed and sent through the AI pipeline
AI response is converted to audio via ElevenLabs
Audio URL is returned in TwiML for playback
Loop continues until the customer hangs up

Performance

TTS generation: 200-800ms (cached for repeated content)
Twilio playback: under 100ms latency
Total voice response: under 1 second for most messages

Service

Credential

Description

Twilio

account_sid

Twilio Account SID

Twilio

auth_token

Twilio Auth Token

Twilio

phone_number

Your Twilio phone number

ElevenLabs

api_key

ElevenLabs API key for TTS

TwiML Response Format

Tavio responds to Twilio with TwiML that plays pre-generated audio:

TwiML response

xml

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Play>https://bankmind.aihookd.site/api/v1/voice/audio/response_abc123</Play>
  <Gather input="speech" timeout="5" action="/api/v1/voice/inbound">
    <!-- Listen for next customer input -->
  </Gather>
</Response>

Parameter

Range

Description

pitch

-20 to +20 semitones

Adjust voice pitch up or down

speakingRate

0.5x to 2.0x

Speed of speech delivery

language

BCP-47 code

Language and accent (e.g., en-KE, sw-TZ)

Call Flow

Customer dials your Twilio phone number

Twilio sends an HTTP POST to your inbound webhook

Tavio verifies the Twilio request signature

A welcome greeting is played via TTS

Customer speaks; Twilio captures speech input

Speech is transcribed and sent through the AI pipeline

AI response is converted to audio via ElevenLabs

Audio URL is returned in TwiML for playback

Loop continues until the customer hangs up