TheVoĉoTheVoĉo

Professional Voice Generation

Create professional IVR prompts, announcements, and voicemail greetings without recording studios or voice actors. Update instantly at any time.

Overview

Text-to-Speech (TTS) converts written text into natural-sounding speech using advanced neural AI voices. Perfect for creating dynamic IVR prompts, announcements, and greetings that can be updated instantly without re-recording.

Key Benefits:

  • Cost Savings: Eliminate recording studio and voice actor costs
  • Instant Updates: Change prompts in seconds, not days
  • Multi-Language: Support 40+ languages with native accents
  • Consistency: Maintain consistent voice across all prompts
  • Personalization: Dynamic content based on caller information

Use Cases:

  • IVR menu prompts and instructions
  • Business hours and holiday announcements
  • Queue position and wait time messages
  • Voicemail greetings and away messages
  • Emergency notifications and alerts
  • Personalized caller greetings

Features

Neural AI Voices

Natural-Sounding Speech:

  • Advanced neural network technology (WaveNet, Azure Neural TTS)
  • Human-like intonation and emotion
  • Natural pauses and breathing patterns
  • Consistent pronunciation and clarity

Voice Catalog:

200+ Professional Voices
40+ Languages
Multiple accents per language
Male and female options
Various speaking styles (friendly, professional, casual, authoritative)

Popular Voice Examples:

English (US):
  👩 Jennifer - Warm, friendly customer service tone
  👩 Joanna - Professional, clear business voice
  👨 Matthew - Authoritative, confident narrator
  👨 Joey - Casual, conversational style

English (UK):
  👩 Emma - British, professional
  👨 Brian - British, formal

Spanish (ES):
  👩 Lucia - Neutral Spanish accent
  👨 Enrique - Professional, clear

French (FR):
  👩 Céline - Parisian accent
  👨 Mathieu - Professional French

German (DE):
  👩 Marlene - Standard German
  👨 Hans - Professional, clear

Dynamic Content

Variable Substitution:

Create prompts with dynamic content that changes based on context:

Text Input:
"Hello {{caller_name}}, thank you for calling {{company_name}}. 
Your account balance is {{account_balance}}."

TTS Output (for John Smith):
"Hello John Smith, thank you for calling Acme Corporation.
Your account balance is two hundred forty-five dollars and thirty cents."

Available Variables:

{{caller_name}} - Caller's name from contact
{{caller_number}} - Caller's phone number
{{company_name}} - Your company name
{{current_time}} - Current time
{{current_date}} - Current date
{{queue_position}} - Position in call queue
{{wait_time}} - Estimated wait time
{{account_balance}} - From CRM/database
{{agent_name}} - Assigned agent name
{{custom_field}} - Any custom data

Use Case Examples:

Personalized Greeting:
"Good {{time_of_day}}, {{caller_name}}. Welcome back to {{company_name}}."
Output: "Good afternoon, Sarah Johnson. Welcome back to Acme Corporation."

Queue Status:
"You are number {{queue_position}} in line. Estimated wait time is {{wait_time}} minutes."
Output: "You are number three in line. Estimated wait time is four minutes."

Account Info:
"Your order number {{order_number}} is {{order_status}} and will arrive on {{delivery_date}}."
Output: "Your order number A B C one two three four is shipped and will arrive on Friday, November twenty-fourth."

SSML Control

Speech Synthesis Markup Language (SSML) provides fine-grained control over pronunciation, pacing, and emphasis:

Basic SSML Example:

<speak>
  Welcome to <emphasis level="strong">Acme Corporation</emphasis>.
  <break time="500ms"/>
  Please listen carefully as our menu options have changed.
</speak>

Common SSML Tags:

Pauses & Breaks:

<break time="500ms"/>  <!-- Half-second pause -->
<break time="1s"/>     <!-- One-second pause -->
<break strength="strong"/>  <!-- Sentence-level pause -->

Emphasis & Volume:

<emphasis level="strong">Important message</emphasis>
<emphasis level="moderate">Notice</emphasis>
<prosody volume="loud">Attention!</prosody>
<prosody volume="soft">Quiet message</prosody>

Speed & Pitch:

<prosody rate="slow">Speak slowly for clarity</prosody>
<prosody rate="fast">Quick information</prosody>
<prosody pitch="high">Higher voice pitch</prosody>
<prosody pitch="low">Lower voice pitch</prosody>

Number & Date Formatting:

<say-as interpret-as="digits">123456</say-as>
<!-- Output: "one two three four five six" -->

<say-as interpret-as="cardinal">123</say-as>
<!-- Output: "one hundred twenty-three" -->

<say-as interpret-as="ordinal">3</say-as>
<!-- Output: "third" -->

<say-as interpret-as="telephone">+1-555-123-4567</say-as>
<!-- Output: "plus one, five five five, one two three, four five six seven" -->

<say-as interpret-as="date" format="mdy">11/22/2025</say-as>
<!-- Output: "November twenty-second, two thousand twenty-five" -->

Spelling Out Words:

Your confirmation code is <say-as interpret-as="spell-out">ABC123</say-as>
<!-- Output: "Your confirmation code is A B C one two three" -->

Setup & Usage

Step 1: Access TTS in Admin Portal

Navigate to TTS:

  1. Log into Cloud-PBX Admin Portal
  2. Go to SettingsText-to-Speech
  3. Or access directly from IVR BuilderAdd PromptGenerate from Text

Step 2: Create Your First TTS Prompt

Simple Prompt Creation:

In IVR Builder:

  1. Click Add Prompt or edit existing prompt
  2. Select Text-to-Speech (instead of Upload Audio)
  3. Enter your text:
    Thank you for calling Acme Corporation. 
    For sales, press 1. 
    For support, press 2. 
    For billing, press 3.
  4. Choose voice: Joanna (English, US, Female, Professional)
  5. Click Preview to hear sample
  6. Click Save to generate and use in IVR

Generated Prompt:

  • Audio file automatically created
  • Stored in your prompt library
  • Ready to use immediately in IVR flows

Step 3: Advanced Options

Voice Settings:

Voice: Joanna (English US, Female)
Language: English (US) 🇺🇸
Speaking Rate: Normal (100%)
  - Slow: 75%
  - Normal: 100% 
  - Fast: 125%
Pitch: Normal (0)
  - Low: -2
  - Normal: 0
  - High: +2
Volume: Normal (0 dB)
  - Soft: -6 dB
  - Normal: 0 dB
  - Loud: +6 dB

Preview Options:

  • 🔊 Listen to preview
  • 📥 Download audio file
  • 🔄 Regenerate with different settings
  • 💾 Save to prompt library

Step 4: Use SSML for Advanced Control

Enable SSML Mode:

  1. Toggle Advanced ModeSSML Enabled
  2. Enter SSML markup instead of plain text
  3. Preview to verify pronunciation and timing
  4. Save when satisfied

Example SSML Prompt:

<speak>
  <emphasis level="strong">Welcome</emphasis> to Acme Corporation.
  <break time="500ms"/>
  Please listen carefully, as our menu options have changed.
  <break time="300ms"/>
  
  For <prosody rate="slow">sales</prosody>, press <say-as interpret-as="digits">1</say-as>.
  <break time="300ms"/>
  
  For <prosody rate="slow">technical support</prosody>, press <say-as interpret-as="digits">2</say-as>.
  <break time="300ms"/>
  
  For <prosody rate="slow">billing</prosody>, press <say-as interpret-as="digits">3</say-as>.
  <break time="500ms"/>
  
  To repeat these options, press <say-as interpret-as="digits">9</say-as>.
</speak>

Step 5: Dynamic Prompts with Variables

Enable Dynamic Content:

  1. In prompt editor, toggle Dynamic ContentEnabled
  2. Use {{variable_name}} syntax for placeholders
  3. Configure variable sources (CRM, database, caller info)
  4. Preview with sample data

Example Dynamic Prompt:

Good {{time_of_day}}, {{caller_name}}. 

Thank you for calling {{company_name}}. 

{{#if has_open_ticket}}
I see you have an open support ticket, number {{ticket_number}}, 
regarding {{ticket_subject}}. 

Press 1 to speak with {{assigned_agent}}, or press 2 for the main menu.
{{else}}
For sales, press 1. For support, press 2.
{{/if}}

Variable Configuration:

Variable: {{caller_name}}
Source: Caller ID Lookup → CRM Contact
Fallback: "valued customer"

Variable: {{time_of_day}}
Source: System Time
  6am-12pm: "morning"
  12pm-5pm: "afternoon"
  5pm-9pm: "evening"

Variable: {{has_open_ticket}}
Source: CRM API Query
Query: "SELECT COUNT(*) FROM tickets WHERE phone = {{caller_number}} AND status = 'Open'"

Use Cases & Examples

Professional IVR Menu

Scenario: Replace outdated recorded IVR with modern TTS

Traditional Approach:

  • Hire voice actor: $300-500
  • Studio recording: $200-400
  • Editing and mastering: $100-200
  • Total: $600-1,100
  • Update time: 2-5 business days

TTS Approach:

  • Text entry: 5 minutes
  • Voice selection: 2 minutes
  • Preview and adjust: 3 minutes
  • Total cost: ~$0.15
  • Update time: 10 minutes

TTS Script:

<speak>
  <emphasis level="strong">Welcome</emphasis> to Acme Corporation.
  <break time="500ms"/>
  
  For <prosody rate="slow">sales and new orders</prosody>, 
  press <say-as interpret-as="digits">1</say-as>.
  <break time="400ms"/>
  
  For <prosody rate="slow">customer support and technical assistance</prosody>, 
  press <say-as interpret-as="digits">2</say-as>.
  <break time="400ms"/>
  
  For <prosody rate="slow">billing and account questions</prosody>, 
  press <say-as interpret-as="digits">3</say-as>.
  <break time="400ms"/>
  
  To hear these options again, press <say-as interpret-as="digits">9</say-as>.
  <break time="500ms"/>
  
  Or, stay on the line for the next available representative.
</speak>

Result: Professional, clear IVR that can be updated anytime for free.


Queue & Wait Time Messages

Dynamic Queue Status:

Thank you for your patience. 

You are currently number {{queue_position}} in line. 

Estimated wait time is {{wait_time}} minutes.

{{#if queue_position > 5}}
To receive a callback when an agent is available, press 1.
{{else}}
An agent will be with you shortly.
{{/if}}

To return to the main menu, press the star key.

Real-Time Variables:

  • {{queue_position}}: Updated in real-time as queue moves
  • {{wait_time}}: Calculated based on average handle time
  • Queue position > 5: Offer callback option
  • Regenerated automatically with current values

Business Hours Announcement

Smart Hours Message:

{{#if is_business_hours}}
  Thank you for calling Acme Corporation. 
  Our business hours are Monday through Friday, 9 AM to 6 PM Eastern Time.
  All of our representatives are currently assisting other customers.
  Please hold, and the next available agent will be with you shortly.
  
{{else if is_weekend}}
  Thank you for calling Acme Corporation.
  You have reached us outside of our normal business hours.
  Our office is open Monday through Friday, 9 AM to 6 PM Eastern Time.
  Please leave a message after the tone, and we will return your call on the next business day.
  For urgent matters, press 9 to reach our emergency support line.
  
{{else if is_holiday}}
  Thank you for calling Acme Corporation.
  Our office is closed today for {{holiday_name}}.
  We will reopen on {{next_business_day}} at 9 AM Eastern Time.
  For urgent matters, please press 9 to reach our emergency support line.
  Otherwise, please leave a message, and we will return your call when we reopen.
{{/if}}

Automatic Schedule Updates:

  • Business hours from calendar
  • Holiday schedule from admin settings
  • Next business day calculated automatically
  • No manual prompt updates needed

Personalized Caller Greeting

VIP Customer Recognition:

Welcome back, {{caller_name}}. 

Thank you for being a valued {{account_tier}} member since {{customer_since}}.

{{#if has_recent_order}}
Your recent order, number {{order_number}}, is {{order_status}}.
{{#if order_status == 'shipped'}}
Tracking shows delivery expected {{delivery_date}}.
{{/if}}
{{/if}}

{{#if has_assigned_account_manager}}
To speak with your dedicated account manager, {{manager_name}}, press 1.
Otherwise, press 2 for our main menu.
{{else}}
For sales, press 1. For support, press 2. For billing, press 3.
{{/if}}

Data Sources:

  • CRM: Account tier, customer since date, account manager
  • Order system: Recent orders, status, delivery tracking
  • Smart routing based on customer relationship

Multi-Language Support

Language Selection with Regional Voices:

English (US):

Voice: Joanna (Professional US Female)
"Welcome to Acme Corporation. For English, press 1. 
Para español, oprima el dos."

Spanish (ES):

Voice: Lucia (Professional Spanish Female)
"Bienvenido a Acme Corporation. Para continuar en español, 
oprima el uno. For English, press two."

French (FR):

Voice: Céline (Professional French Female)
"Bienvenue chez Acme Corporation. Pour continuer en français, 
appuyez sur le un. For English, press two."

Implementation:

  • Language detection from caller ID or IVR selection
  • Switch voices based on chosen language
  • Consistent prompts across all languages
  • Update all language versions simultaneously

Best Practices

Writing Effective TTS Scripts

Do's:

  • ✅ Write conversationally (how people speak, not write)
  • ✅ Use short sentences and phrases
  • ✅ Spell out numbers for clarity ("one" not "1")
  • ✅ Use punctuation for natural pauses
  • ✅ Test pronunciation with preview before deploying
  • ✅ Consider caller's perspective and information needs

Don'ts:

  • ❌ Don't use overly formal or complex language
  • ❌ Don't write run-on sentences (listeners can't rewind)
  • ❌ Don't assume pronunciation (spell phonetically if needed)
  • ❌ Don't overuse emphasis or special effects
  • ❌ Don't forget pauses between menu options

Pronunciation Control

Common Challenges:

Acronyms & Abbreviations:

<!-- Wrong: TTS might say "P.B.X." as "Pibix" -->
PBX system

<!-- Right: Force spelling -->
<say-as interpret-as="spell-out">PBX</say-as> system
<!-- Output: "P B X system" -->

Company & Product Names:

<!-- If mispronounced, use phonetic spelling -->
<!-- Wrong pronunciation -->
Acme

<!-- Force phonetic spelling -->
<phoneme alphabet="ipa" ph="ˈæk.mi">Acme</phoneme>

<!-- Or spell it out -->
<say-as interpret-as="spell-out">ACME</say-as>

Numbers & Codes:

<!-- Account numbers: spell out -->
Your account number is <say-as interpret-as="digits">123456</say-as>
<!-- Output: "one two three four five six" -->

<!-- Prices: use currency -->
Your balance is <say-as interpret-as="currency" language="en-US">$245.30</say-as>
<!-- Output: "two hundred forty-five dollars and thirty cents" -->

<!-- Phone numbers -->
<say-as interpret-as="telephone">555-1234</say-as>
<!-- Output: "five five five, one two three four" -->

Voice Selection Tips

Match Voice to Use Case:

Customer Service IVR:
  Best: Joanna (US), Emma (UK) - Professional, friendly
  Avoid: Joey - Too casual for business

Emergency/Security Alerts:
  Best: Matthew (US), Brian (UK) - Authoritative, clear
  Avoid: Soft voices - May lack urgency

Marketing/Sales Prompts:
  Best: Jennifer (US), Amy (UK) - Warm, engaging
  Avoid: Overly formal voices

Technical Instructions:
  Best: Joanna (US), Emma (UK) - Clear, moderate pace
  Avoid: Fast-paced voices

Multi-Lingual Support:
  Best: Native accent voices for each language
  Avoid: English voice attempting other languages

Performance Optimization

Caching & Pre-Generation:

Static Prompts (rarely change):

Generate Once, Cache Forever:
  - Main IVR menu
  - Company greeting
  - Standard instructions
  
Benefits:
  ✅ Instant playback (no generation delay)
  ✅ No usage fees after initial generation
  ✅ Consistent experience

Dynamic Prompts (change per call):

Generate On-Demand:
  - Personalized greetings with {{caller_name}}
  - Queue position messages with {{queue_position}}
  - Account-specific information
  
Considerations:
  ⏱️ 100-300ms generation latency
  💰 Per-generation usage fees
  🔄 Short cache TTL (5-10 seconds)

Hybrid Approach:

Pre-generate templates with common variables:
  "Welcome back, {{caller_name}}" → Cache multiple versions
  "You are number [X] in line" → Pre-generate 1-20
  
Result: Fast playback + personalization

Pricing

Usage-Based Pricing

Standard Voices:

Cost: $4 per 1 million characters
Per-character: ~$0.000004

Examples:
  "Thank you for calling." (24 chars) = $0.000096
  Full IVR menu (500 chars) = $0.002
  1,000 menu plays/month = $2.00/month

Neural Voices (Recommended):

Cost: $16 per 1 million characters
Per-character: ~$0.000016

Examples:
  "Thank you for calling." (24 chars) = $0.000384
  Full IVR menu (500 chars) = $0.008
  1,000 menu plays/month = $8.00/month

SSML Characters:

SSML markup does NOT count toward character usage.

Example:
<speak>
  <emphasis>Hello</emphasis> <break time="500ms"/> world
</speak>

Billable characters: 11 ("Hello world")
Non-billable: SSML tags (<speak>, <emphasis>, <break>)

Cost Optimization

Strategies to Reduce Costs:

1. Cache Static Prompts:

Generate once, use unlimited times:
  Main menu: Generate once = $0.008
  Played 10,000 times = Still $0.008 total
  
Savings: 99.9% vs. generating every time

2. Use Standard Voices for Non-Critical Prompts:

Neural voice: $0.008 per menu
Standard voice: $0.002 per menu
  
For internal/hold music announcements where quality less critical
Savings: 75%

3. Optimize Text Length:

❌ Wordy: "We would like to take this opportunity to thank you for taking the time to call our company today." (115 chars)

✅ Concise: "Thank you for calling." (24 chars)

Savings: 79% fewer characters = 79% lower cost

4. Pre-Generate Common Variables:

Instead of: "You are number {{queue_position}} in line"
Pre-generate: "You are number 1 in line" through "You are number 20 in line"
  
Generate 20 versions once vs. thousands of dynamic generations
Savings: 95%+ for high-traffic queues

Troubleshooting

Pronunciation Issues

Problem: TTS mispronounces company name, product, or acronym

Solutions:

1. Phonetic Spelling:

Wrong: "TheVoĉo" (pronounced "vo-ko")
Right: "Voco" (should be "vo-ca")

Fix: Phonetic helper
<phoneme alphabet="ipa" ph="ˈvoʊ.koʊ">TheVoĉo</phoneme>

2. Spell Out:

<say-as interpret-as="spell-out">TheVoĉo</say-as>
<!-- Output: "V O C O" -->

3. Alternative Spelling:

Wrong: "Acme" (pronounced "Ack-mee")
Try: "Ackmee" or "Ack-me"

4. Build Custom Lexicon:

<!-- Admin → TTS → Custom Pronunciations -->
Word: SQL
Pronunciation: "sequel" or "S Q L"

Word: Kubernetes  
Pronunciation: <phoneme>koo-ber-net-eez</phoneme>

Voice Quality Issues

Problem: TTS voice sounds robotic or unnatural

Solutions:

1. Upgrade to Neural Voices:

Standard Voice → Neural Voice
Cost: 4x more, but significantly more natural

2. Add SSML Prosody:

<!-- Add natural variation -->
<prosody rate="95%" pitch="-1st">
  Welcome to our company.
</prosody>

3. Use Punctuation:

❌ "Welcome to Acme Corporation for sales press 1 for support press 2"

✅ "Welcome to Acme Corporation. For sales, press 1. For support, press 2."

4. Add Breaks:

<speak>
  Welcome to Acme Corporation.
  <break time="500ms"/>
  For sales, press 1.
  <break time="300ms"/>
  For support, press 2.
</speak>

Generation Failures

Problem: TTS generation fails or takes too long

Diagnostic:

Check TTS Dashboard:
  - API Status: ✅ Operational
  - Queue Depth: 0 requests
  - Average Generation Time: 250ms
  - Error Rate: 0.0%

Common Causes:

1. Invalid SSML:

<!-- Wrong: Unclosed tag -->
<speak>
  <emphasis>Hello
</speak>

<!-- Right: Properly closed -->
<speak>
  <emphasis>Hello</emphasis>
</speak>

2. Text Too Long:

Max characters: 3,000 per generation
Solution: Split into multiple prompts

3. Unsupported Characters:

Remove special characters: ©, ™, ®, emoji
Use: (c), (TM), (R), spelled-out emotions

4. Rate Limiting:

Limit: 100 requests per minute
Solution: Implement request queuing or upgrade plan

Advanced Features

Voice Cloning (Enterprise)

Custom Voice Creation:

Create a custom voice based on your company spokesperson or brand:

Process:

  1. Record 30-60 minutes of high-quality audio
  2. Submit audio for voice training (2-4 weeks)
  3. Custom voice becomes available in TTS engine
  4. Use custom voice across all IVR prompts

Benefits:

  • Consistent brand voice across all channels
  • Professional spokesperson without ongoing recording costs
  • Update prompts anytime in spokesperson's voice
  • Multi-language support with same voice characteristics

Cost: Contact sales for pricing (Enterprise plan)


A/B Testing

Test Voice Effectiveness:

Scenario: Which voice/script converts better?

Version A:

Voice: Matthew (Authoritative Male)
"Press 1 for sales."

Version B:

Voice: Jennifer (Friendly Female)
"If you'd like to speak with our sales team, press 1."

Metrics:

  • Conversion rate (press 1 vs. hang up)
  • Average time to decision
  • Caller satisfaction

Implementation:

  • Route 50% of calls to each version
  • Track metrics for 1-2 weeks
  • Deploy winning version to 100%

Getting Help

TTS Support

Need help with Text-to-Speech?

Common Questions:

  • Pronunciation issues: Submit word with desired pronunciation
  • Voice selection: Request voice samples for your use case
  • SSML help: Provide desired effect, we'll suggest markup
  • Cost optimization: Share your usage, we'll recommend strategies

Contact:

  • Email: [email protected]
  • Include: Text, desired output, current result (if applicable)
  • Response time: < 4 hours (business hours)

Resources:

  • 📹 Video Tutorial: Creating Professional IVR with TTS (8 minutes)
  • 📄 SSML Quick Reference Guide (PDF)
  • 🎧 Voice Samples: Listen to all available voices
  • 📚 TTS Best Practices Handbook

Next Steps

Get Started with TTS:

  1. ✅ Log into Admin Portal → IVR Builder
  2. ✅ Create or edit existing IVR
  3. ✅ Add TTS prompt with simple text
  4. ✅ Preview and adjust voice settings
  5. ✅ Deploy and test with live call
  6. ✅ Iterate based on caller feedback

Explore Advanced Features:

Related Documentation: