Voice (IVR)
Tavio's voice channel combines Twilio for telephony with ElevenLabs for natural text-to-speech. Customers call your number and interact with your AI agent by voice.
Architecture
Inbound Call (Twilio) -> Speech-to-Text -> AI Pipeline -> TTS (ElevenLabs) -> Audio Playback (Twilio)
- Twilio handles call routing, recording, and audio playback
- ElevenLabs generates natural-sounding speech with voice customization
- AI responses are capped at 150 tokens for spoken brevity
Webhook URLs
Configure these in your Twilio phone number settings:
Inbound voice webhook
text
https://bankmind.aihookd.site/api/v1/voice/inboundStatus callback
text
https://bankmind.aihookd.site/api/v1/voice/statusRequired Credentials
| Service | Credential | Description |
|---|---|---|
| Twilio | account_sid | Twilio Account SID |
| Twilio | auth_token | Twilio Auth Token |
| Twilio | phone_number | Your Twilio phone number |
| ElevenLabs | api_key | ElevenLabs API key for TTS |
TwiML Response Format
Tavio responds to Twilio with TwiML that plays pre-generated audio:
TwiML response
xml
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Play>https://bankmind.aihookd.site/api/v1/voice/audio/response_abc123</Play>
<Gather input="speech" timeout="5" action="/api/v1/voice/inbound">
<!-- Listen for next customer input -->
</Gather>
</Response>Voice Customization
ElevenLabs TTS supports full voice customization. Configure these in Dashboard > Voice:
| Parameter | Range | Description |
|---|---|---|
| pitch | -20 to +20 semitones | Adjust voice pitch up or down |
| speakingRate | 0.5x to 2.0x | Speed of speech delivery |
| language | BCP-47 code | Language and accent (e.g., en-KE, sw-TZ) |
Call Flow
- Customer dials your Twilio phone number
- Twilio sends an HTTP POST to your inbound webhook
- Tavio verifies the Twilio request signature
- A welcome greeting is played via TTS
- Customer speaks; Twilio captures speech input
- Speech is transcribed and sent through the AI pipeline
- AI response is converted to audio via ElevenLabs
- Audio URL is returned in TwiML for playback
- Loop continues until the customer hangs up
Performance
- TTS generation: 200-800ms (cached for repeated content)
- Twilio playback: under 100ms latency
- Total voice response: under 1 second for most messages