Text-to-Speech (TTS) Doc

Professional Voice Generation

Create professional IVR prompts, announcements, and voicemail greetings without recording studios or voice actors. Update instantly at any time.

Overview

Text-to-Speech (TTS) converts written text into natural-sounding speech using advanced neural AI voices. Perfect for creating dynamic IVR prompts, announcements, and greetings that can be updated instantly without re-recording.

Key Benefits:

Cost Savings: Eliminate recording studio and voice actor costs
Instant Updates: Change prompts in seconds, not days
Multi-Language: Support 40+ languages with native accents
Consistency: Maintain consistent voice across all prompts
Personalization: Dynamic content based on caller information

Use Cases:

IVR menu prompts and instructions
Business hours and holiday announcements
Queue position and wait time messages
Voicemail greetings and away messages
Emergency notifications and alerts
Personalized caller greetings

Features

Neural AI Voices

Natural-Sounding Speech:

Advanced neural network technology (WaveNet, Azure Neural TTS)
Human-like intonation and emotion
Natural pauses and breathing patterns
Consistent pronunciation and clarity

Voice Catalog:

200+ Professional Voices
40+ Languages
Multiple accents per language
Male and female options
Various speaking styles (friendly, professional, casual, authoritative)

Popular Voice Examples:

English (US):
  👩 Jennifer - Warm, friendly customer service tone
  👩 Joanna - Professional, clear business voice
  👨 Matthew - Authoritative, confident narrator
  👨 Joey - Casual, conversational style

English (UK):
  👩 Emma - British, professional
  👨 Brian - British, formal

Spanish (ES):
  👩 Lucia - Neutral Spanish accent
  👨 Enrique - Professional, clear

French (FR):
  👩 Céline - Parisian accent
  👨 Mathieu - Professional French

German (DE):
  👩 Marlene - Standard German
  👨 Hans - Professional, clear

Dynamic Content

Variable Substitution:

Create prompts with dynamic content that changes based on context:

Text Input:
"Hello {{caller_name}}, thank you for calling {{company_name}}. 
Your account balance is {{account_balance}}."

TTS Output (for John Smith):
"Hello John Smith, thank you for calling Acme Corporation.
Your account balance is two hundred forty-five dollars and thirty cents."

Available Variables:

{{caller_name}} - Caller's name from contact
{{caller_number}} - Caller's phone number
{{company_name}} - Your company name
{{current_time}} - Current time
{{current_date}} - Current date
{{queue_position}} - Position in call queue
{{wait_time}} - Estimated wait time
{{account_balance}} - From CRM/database
{{agent_name}} - Assigned agent name
{{custom_field}} - Any custom data

Use Case Examples:

Personalized Greeting:
"Good {{time_of_day}}, {{caller_name}}. Welcome back to {{company_name}}."
Output: "Good afternoon, Sarah Johnson. Welcome back to Acme Corporation."

Queue Status:
"You are number {{queue_position}} in line. Estimated wait time is {{wait_time}} minutes."
Output: "You are number three in line. Estimated wait time is four minutes."

Account Info:
"Your order number {{order_number}} is {{order_status}} and will arrive on {{delivery_date}}."
Output: "Your order number A B C one two three four is shipped and will arrive on Friday, November twenty-fourth."

SSML Control

Speech Synthesis Markup Language (SSML) provides fine-grained control over pronunciation, pacing, and emphasis:

Basic SSML Example:

<speak>
  Welcome to <emphasis level="strong">Acme Corporation</emphasis>.
  <break time="500ms"/>
  Please listen carefully as our menu options have changed.
</speak>

Common SSML Tags:

Pauses & Breaks:

<break time="500ms"/>  <!-- Half-second pause -->
<break time="1s"/>     <!-- One-second pause -->
<break strength="strong"/>  <!-- Sentence-level pause -->

Emphasis & Volume:

<emphasis level="strong">Important message</emphasis>
<emphasis level="moderate">Notice</emphasis>
<prosody volume="loud">Attention!</prosody>
<prosody volume="soft">Quiet message</prosody>

Speed & Pitch:

<prosody rate="slow">Speak slowly for clarity</prosody>
<prosody rate="fast">Quick information</prosody>
<prosody pitch="high">Higher voice pitch</prosody>
<prosody pitch="low">Lower voice pitch</prosody>

Number & Date Formatting:

<say-as interpret-as="digits">123456</say-as>
<!-- Output: "one two three four five six" -->

<say-as interpret-as="cardinal">123</say-as>
<!-- Output: "one hundred twenty-three" -->

<say-as interpret-as="ordinal">3</say-as>
<!-- Output: "third" -->

<say-as interpret-as="telephone">+1-555-123-4567</say-as>
<!-- Output: "plus one, five five five, one two three, four five six seven" -->

<say-as interpret-as="date" format="mdy">11/22/2025</say-as>
<!-- Output: "November twenty-second, two thousand twenty-five" -->

Spelling Out Words:

Your confirmation code is <say-as interpret-as="spell-out">ABC123</say-as>
<!-- Output: "Your confirmation code is A B C one two three" -->

Setup & Usage

Step 1: Access TTS in Admin Portal

Navigate to TTS:

Log into Cloud-PBX Admin Portal
Go to Settings → Text-to-Speech
Or access directly from IVR Builder → Add Prompt → Generate from Text

Step 2: Create Your First TTS Prompt

Simple Prompt Creation:

In IVR Builder:

Click Add Prompt or edit existing prompt
Select Text-to-Speech (instead of Upload Audio)

Enter your text:

Thank you for calling Acme Corporation. 
For sales, press 1. 
For support, press 2. 
For billing, press 3.

Choose voice: Joanna (English, US, Female, Professional)
Click Preview to hear sample
Click Save to generate and use in IVR

Generated Prompt:

Audio file automatically created
Stored in your prompt library
Ready to use immediately in IVR flows

Step 3: Advanced Options

Voice Settings:

Voice: Joanna (English US, Female)
Language: English (US) 🇺🇸
Speaking Rate: Normal (100%)
  - Slow: 75%
  - Normal: 100% 
  - Fast: 125%
Pitch: Normal (0)
  - Low: -2
  - Normal: 0
  - High: +2
Volume: Normal (0 dB)
  - Soft: -6 dB
  - Normal: 0 dB
  - Loud: +6 dB

Preview Options:

🔊 Listen to preview
📥 Download audio file
🔄 Regenerate with different settings
💾 Save to prompt library

Step 4: Use SSML for Advanced Control

Enable SSML Mode:

Toggle Advanced Mode → SSML Enabled
Enter SSML markup instead of plain text
Preview to verify pronunciation and timing
Save when satisfied

Example SSML Prompt:

<speak>
  <emphasis level="strong">Welcome</emphasis> to Acme Corporation.
  <break time="500ms"/>
  Please listen carefully, as our menu options have changed.
  <break time="300ms"/>
  
  For <prosody rate="slow">sales</prosody>, press <say-as interpret-as="digits">1</say-as>.
  <break time="300ms"/>
  
  For <prosody rate="slow">technical support</prosody>, press <say-as interpret-as="digits">2</say-as>.
  <break time="300ms"/>
  
  For <prosody rate="slow">billing</prosody>, press <say-as interpret-as="digits">3</say-as>.
  <break time="500ms"/>
  
  To repeat these options, press <say-as interpret-as="digits">9</say-as>.
</speak>

Step 5: Dynamic Prompts with Variables

Enable Dynamic Content:

In prompt editor, toggle Dynamic Content → Enabled
Use {{variable_name}} syntax for placeholders
Configure variable sources (CRM, database, caller info)
Preview with sample data

Example Dynamic Prompt:

Good {{time_of_day}}, {{caller_name}}. 

Thank you for calling {{company_name}}. 

{{#if has_open_ticket}}
I see you have an open support ticket, number {{ticket_number}}, 
regarding {{ticket_subject}}. 

Press 1 to speak with {{assigned_agent}}, or press 2 for the main menu.
{{else}}
For sales, press 1. For support, press 2.
{{/if}}

Variable Configuration:

Variable: {{caller_name}}
Source: Caller ID Lookup → CRM Contact
Fallback: "valued customer"

Variable: {{time_of_day}}
Source: System Time
  6am-12pm: "morning"
  12pm-5pm: "afternoon"
  5pm-9pm: "evening"

Variable: {{has_open_ticket}}
Source: CRM API Query
Query: "SELECT COUNT(*) FROM tickets WHERE phone = {{caller_number}} AND status = 'Open'"

Use Cases & Examples

Scenario: Replace outdated recorded IVR with modern TTS

Traditional Approach:

Hire voice actor: $300-500
Studio recording: $200-400
Editing and mastering: $100-200
Total: $600-1,100
Update time: 2-5 business days

TTS Approach:

Text entry: 5 minutes
Voice selection: 2 minutes
Preview and adjust: 3 minutes
Total cost: ~$0.15
Update time: 10 minutes

TTS Script:

<speak>
  <emphasis level="strong">Welcome</emphasis> to Acme Corporation.
  <break time="500ms"/>
  
  For <prosody rate="slow">sales and new orders</prosody>, 
  press <say-as interpret-as="digits">1</say-as>.
  <break time="400ms"/>
  
  For <prosody rate="slow">customer support and technical assistance</prosody>, 
  press <say-as interpret-as="digits">2</say-as>.
  <break time="400ms"/>
  
  For <prosody rate="slow">billing and account questions</prosody>, 
  press <say-as interpret-as="digits">3</say-as>.
  <break time="400ms"/>
  
  To hear these options again, press <say-as interpret-as="digits">9</say-as>.
  <break time="500ms"/>
  
  Or, stay on the line for the next available representative.
</speak>

Result: Professional, clear IVR that can be updated anytime for free.

Queue & Wait Time Messages

Dynamic Queue Status:

Thank you for your patience. 

You are currently number {{queue_position}} in line. 

Estimated wait time is {{wait_time}} minutes.

{{#if queue_position > 5}}
To receive a callback when an agent is available, press 1.
{{else}}
An agent will be with you shortly.
{{/if}}

To return to the main menu, press the star key.

Real-Time Variables:

{{queue_position}}: Updated in real-time as queue moves
{{wait_time}}: Calculated based on average handle time
Queue position > 5: Offer callback option
Regenerated automatically with current values

Business Hours Announcement

Smart Hours Message:

{{#if is_business_hours}}
  Thank you for calling Acme Corporation. 
  Our business hours are Monday through Friday, 9 AM to 6 PM Eastern Time.
  All of our representatives are currently assisting other customers.
  Please hold, and the next available agent will be with you shortly.
  
{{else if is_weekend}}
  Thank you for calling Acme Corporation.
  You have reached us outside of our normal business hours.
  Our office is open Monday through Friday, 9 AM to 6 PM Eastern Time.
  Please leave a message after the tone, and we will return your call on the next business day.
  For urgent matters, press 9 to reach our emergency support line.
  
{{else if is_holiday}}
  Thank you for calling Acme Corporation.
  Our office is closed today for {{holiday_name}}.
  We will reopen on {{next_business_day}} at 9 AM Eastern Time.
  For urgent matters, please press 9 to reach our emergency support line.
  Otherwise, please leave a message, and we will return your call when we reopen.
{{/if}}

Automatic Schedule Updates:

Business hours from calendar
Holiday schedule from admin settings
Next business day calculated automatically
No manual prompt updates needed

Personalized Caller Greeting

VIP Customer Recognition:

Welcome back, {{caller_name}}. 

Thank you for being a valued {{account_tier}} member since {{customer_since}}.

{{#if has_recent_order}}
Your recent order, number {{order_number}}, is {{order_status}}.
{{#if order_status == 'shipped'}}
Tracking shows delivery expected {{delivery_date}}.
{{/if}}
{{/if}}

{{#if has_assigned_account_manager}}
To speak with your dedicated account manager, {{manager_name}}, press 1.
Otherwise, press 2 for our main menu.
{{else}}
For sales, press 1. For support, press 2. For billing, press 3.
{{/if}}

Data Sources:

CRM: Account tier, customer since date, account manager
Order system: Recent orders, status, delivery tracking
Smart routing based on customer relationship

Multi-Language Support

Language Selection with Regional Voices:

English (US):

Voice: Joanna (Professional US Female)
"Welcome to Acme Corporation. For English, press 1. 
Para español, oprima el dos."

Spanish (ES):

Voice: Lucia (Professional Spanish Female)
"Bienvenido a Acme Corporation. Para continuar en español, 
oprima el uno. For English, press two."

French (FR):

Voice: Céline (Professional French Female)
"Bienvenue chez Acme Corporation. Pour continuer en français, 
appuyez sur le un. For English, press two."

Implementation:

Language detection from caller ID or IVR selection
Switch voices based on chosen language
Consistent prompts across all languages
Update all language versions simultaneously

Best Practices

Writing Effective TTS Scripts

Do's:

✅ Write conversationally (how people speak, not write)
✅ Use short sentences and phrases
✅ Spell out numbers for clarity ("one" not "1")
✅ Use punctuation for natural pauses
✅ Test pronunciation with preview before deploying
✅ Consider caller's perspective and information needs

Don'ts:

❌ Don't use overly formal or complex language
❌ Don't write run-on sentences (listeners can't rewind)
❌ Don't assume pronunciation (spell phonetically if needed)
❌ Don't overuse emphasis or special effects
❌ Don't forget pauses between menu options

Pronunciation Control

Common Challenges:

Acronyms & Abbreviations:

<!-- Wrong: TTS might say "P.B.X." as "Pibix" -->
PBX system

<!-- Right: Force spelling -->
<say-as interpret-as="spell-out">PBX</say-as> system
<!-- Output: "P B X system" -->

Company & Product Names:

<!-- If mispronounced, use phonetic spelling -->
<!-- Wrong pronunciation -->
Acme

<!-- Force phonetic spelling -->
<phoneme alphabet="ipa" ph="ˈæk.mi">Acme</phoneme>

<!-- Or spell it out -->
<say-as interpret-as="spell-out">ACME</say-as>

Numbers & Codes:

<!-- Account numbers: spell out -->
Your account number is <say-as interpret-as="digits">123456</say-as>
<!-- Output: "one two three four five six" -->

<!-- Prices: use currency -->
Your balance is <say-as interpret-as="currency" language="en-US">$245.30</say-as>
<!-- Output: "two hundred forty-five dollars and thirty cents" -->

<!-- Phone numbers -->
<say-as interpret-as="telephone">555-1234</say-as>
<!-- Output: "five five five, one two three four" -->

Voice Selection Tips

Match Voice to Use Case:

Customer Service IVR:
  Best: Joanna (US), Emma (UK) - Professional, friendly
  Avoid: Joey - Too casual for business

Emergency/Security Alerts:
  Best: Matthew (US), Brian (UK) - Authoritative, clear
  Avoid: Soft voices - May lack urgency

Marketing/Sales Prompts:
  Best: Jennifer (US), Amy (UK) - Warm, engaging
  Avoid: Overly formal voices

Technical Instructions:
  Best: Joanna (US), Emma (UK) - Clear, moderate pace
  Avoid: Fast-paced voices

Multi-Lingual Support:
  Best: Native accent voices for each language
  Avoid: English voice attempting other languages

Performance Optimization

Caching & Pre-Generation:

Static Prompts (rarely change):

Generate Once, Cache Forever:
  - Main IVR menu
  - Company greeting
  - Standard instructions
  
Benefits:
  ✅ Instant playback (no generation delay)
  ✅ No usage fees after initial generation
  ✅ Consistent experience

Dynamic Prompts (change per call):

Generate On-Demand:
  - Personalized greetings with {{caller_name}}
  - Queue position messages with {{queue_position}}
  - Account-specific information
  
Considerations:
  ⏱️ 100-300ms generation latency
  💰 Per-generation usage fees
  🔄 Short cache TTL (5-10 seconds)

Hybrid Approach:

Pre-generate templates with common variables:
  "Welcome back, {{caller_name}}" → Cache multiple versions
  "You are number [X] in line" → Pre-generate 1-20
  
Result: Fast playback + personalization

Pricing

Usage-Based Pricing

Standard Voices:

Cost: $4 per 1 million characters
Per-character: ~$0.000004

Examples:
  "Thank you for calling." (24 chars) = $0.000096
  Full IVR menu (500 chars) = $0.002
  1,000 menu plays/month = $2.00/month

Neural Voices (Recommended):

Cost: $16 per 1 million characters
Per-character: ~$0.000016

Examples:
  "Thank you for calling." (24 chars) = $0.000384
  Full IVR menu (500 chars) = $0.008
  1,000 menu plays/month = $8.00/month

SSML Characters:

SSML markup does NOT count toward character usage.

Example:
<speak>
  <emphasis>Hello</emphasis> <break time="500ms"/> world
</speak>

Billable characters: 11 ("Hello world")
Non-billable: SSML tags (<speak>, <emphasis>, <break>)

Cost Optimization

Strategies to Reduce Costs:

1. Cache Static Prompts:

Generate once, use unlimited times:
  Main menu: Generate once = $0.008
  Played 10,000 times = Still $0.008 total
  
Savings: 99.9% vs. generating every time

2. Use Standard Voices for Non-Critical Prompts:

Neural voice: $0.008 per menu
Standard voice: $0.002 per menu
  
For internal/hold music announcements where quality less critical
Savings: 75%

3. Optimize Text Length:

❌ Wordy: "We would like to take this opportunity to thank you for taking the time to call our company today." (115 chars)

✅ Concise: "Thank you for calling." (24 chars)

Savings: 79% fewer characters = 79% lower cost

4. Pre-Generate Common Variables:

Instead of: "You are number {{queue_position}} in line"
Pre-generate: "You are number 1 in line" through "You are number 20 in line"
  
Generate 20 versions once vs. thousands of dynamic generations
Savings: 95%+ for high-traffic queues

Troubleshooting

Pronunciation Issues

Problem: TTS mispronounces company name, product, or acronym

Solutions:

1. Phonetic Spelling:

Wrong: "TheVoĉo" (pronounced "vo-ko")
Right: "Voco" (should be "vo-ca")

Fix: Phonetic helper
<phoneme alphabet="ipa" ph="ˈvoʊ.koʊ">TheVoĉo</phoneme>

2. Spell Out:

<say-as interpret-as="spell-out">TheVoĉo</say-as>
<!-- Output: "V O C O" -->

3. Alternative Spelling:

Wrong: "Acme" (pronounced "Ack-mee")
Try: "Ackmee" or "Ack-me"

4. Build Custom Lexicon:

<!-- Admin → TTS → Custom Pronunciations -->
Word: SQL
Pronunciation: "sequel" or "S Q L"

Word: Kubernetes  
Pronunciation: <phoneme>koo-ber-net-eez</phoneme>

Voice Quality Issues

Problem: TTS voice sounds robotic or unnatural

Solutions:

1. Upgrade to Neural Voices:

Standard Voice → Neural Voice
Cost: 4x more, but significantly more natural

2. Add SSML Prosody:

<!-- Add natural variation -->
<prosody rate="95%" pitch="-1st">
  Welcome to our company.
</prosody>

3. Use Punctuation:

❌ "Welcome to Acme Corporation for sales press 1 for support press 2"

✅ "Welcome to Acme Corporation. For sales, press 1. For support, press 2."

4. Add Breaks:

<speak>
  Welcome to Acme Corporation.
  <break time="500ms"/>
  For sales, press 1.
  <break time="300ms"/>
  For support, press 2.
</speak>

Generation Failures

Problem: TTS generation fails or takes too long

Diagnostic:

Check TTS Dashboard:
  - API Status: ✅ Operational
  - Queue Depth: 0 requests
  - Average Generation Time: 250ms
  - Error Rate: 0.0%

Common Causes:

1. Invalid SSML:

<!-- Wrong: Unclosed tag -->
<speak>
  <emphasis>Hello
</speak>

<!-- Right: Properly closed -->
<speak>
  <emphasis>Hello</emphasis>
</speak>

2. Text Too Long:

Max characters: 3,000 per generation
Solution: Split into multiple prompts

3. Unsupported Characters:

Remove special characters: ©, ™, ®, emoji
Use: (c), (TM), (R), spelled-out emotions

4. Rate Limiting:

Limit: 100 requests per minute
Solution: Implement request queuing or upgrade plan

Advanced Features

Voice Cloning (Enterprise)

Custom Voice Creation:

Create a custom voice based on your company spokesperson or brand:

Process:

Record 30-60 minutes of high-quality audio
Submit audio for voice training (2-4 weeks)
Custom voice becomes available in TTS engine
Use custom voice across all IVR prompts

Benefits:

Consistent brand voice across all channels
Professional spokesperson without ongoing recording costs
Update prompts anytime in spokesperson's voice
Multi-language support with same voice characteristics

Cost: Contact sales for pricing (Enterprise plan)

A/B Testing

Test Voice Effectiveness:

Scenario: Which voice/script converts better?

Version A:

Voice: Matthew (Authoritative Male)
"Press 1 for sales."

Version B:

Voice: Jennifer (Friendly Female)
"If you'd like to speak with our sales team, press 1."

Metrics:

Conversion rate (press 1 vs. hang up)
Average time to decision
Caller satisfaction