UChat Official

Introduction

This detailed summary explores the sophisticated functionalities and configurations of the real-time voice AI system in UCHAT, as presented in the transcript.

Designed for users familiar with basic setup, this guide delves into advanced features such as AI functions, voice customization, call recordings, and multi-channel integrations.

The goal is to provide a clear, structured understanding of how to leverage these capabilities to enhance automated voice interactions, ensuring seamless integration, natural voice quality, and efficient call management.

Core Concepts and Features

1. AI Functions: Extending Capabilities Beyond Basic Responses

The system supports custom AI functions that enable the AI agent to perform specific tasks during calls, such as fetching weather data, evaluating call termination, or transferring calls to human agents.

Function Name

Purpose

Description

Workflow Trigger

Parameters

Get Weather

Retrieve weather info

Checks weather based on user-provided location

External API call (simulated in demo)

city (string)

Evaluate Call Termination

Decide if call should end

AI determines if the conversation has concluded

Internal logic

None

Transfer to Human Agent

Escalate call

Transfers call to a human operator for urgent or complex issues

Call transfer workflow

phone number, transfer message

Implementation Highlights:

  • Creating Functions: Each function is defined with a name, description, and parameters.

  • Triggering Workflows: Functions invoke workflows that perform actions like API calls or call transfers.

  • Parameter Collection: For example, Get Weather prompts the user for a city, then triggers a workflow to fetch weather data.

  • Response Handling: The AI responds with the data, e.g., "The weather in Brisbane is sunny and 25°C."

2. Connecting AI Functions to the Main Call Flow

The integration process involves:

  • Selecting relevant AI functions within the AI agent configuration.

  • Linking functions to specific decision points in the call flow.

  • Using "Stop AI Agent" actions to control flow transitions.

  • Configuring conditional branches based on function outcomes, such as transferring to a human agent or ending the call.

Example Workflow:

  • User requests weather → Get Weather function triggers → AI responds with weather info.

  • User requests escalation → Transfer to Human Agent function triggers → Call is transferred with a message.

3. Call Termination and Transfer Logic

  • Evaluate Call Termination: The AI assesses if the call should end, playing a closing message like "Thank you for your call, goodbye," and then hanging up.

  • Transfer to Human Agent: When necessary, the system transfers the call to a specified phone number, optionally providing a call whisper (background message) to the recipient.

Flow Control:

  • After function execution, the system uses "Stop AI Agent" to proceed to next steps.

  • Call transfer options include success/failure handling and whisper messages for context.

Voice Customization and Quality

1. Using 11 Labs for Voice Consistency

To ensure a natural, consistent voice across all interactions, the system integrates with 11 Labs, a voice synthesis platform.

Setup Steps:

  • Obtain Voice ID: From 11 Labs marketplace, select a voice (e.g., expressive Indian voice), add it to your collection, and copy the Voice ID.

  • Configure in UAT:

    • In the AI Agent settings, select 11 Labs as the voice provider.

    • Paste the Voice ID into the AI agent configuration.

    • Repeat for message nodes and start flow settings to ensure uniform voice output.

Advantages:

  • Unified Voice: Same voice across AI agent and message nodes.

  • Expressiveness: Rich, natural-sounding voices enhance user experience.

Limitations:

  • Latency: Sending text to 11 Labs for synthesis introduces a delay (~2-3 seconds), which is generally acceptable but noticeable.

  • Processing Time: Additional processing may slightly impact real-time responsiveness.

2. Voice Quality Management

  • Voice Providers: Default voices are from Google or other providers, which may sound inconsistent.

  • Custom Voices: 11 Labs allows custom, expressive voices, improving realism.

  • Implementation Tip: Use the same voice ID across all nodes for consistency.

Call Recordings and Transcriptions

1. Enabling Call Recordings

  • Default Setting: Turned off to save costs and ensure privacy.

  • Activation:

    • In UAT Settings, enable Core Recordings.

    • Select call categories (inbound, outbound, all).

    • Recordings are stored in your S3 storage (requires integration).

2. Accessing and Using Recordings

  • Recordings are accessible via the Content > Recordings section.

  • Cost is associated with storage and recording duration.

  • Playback: Directly listen to recordings within UAT.

3. Post-Call Processing

  • Trigger on Recording Completion: When a call ends, a webhook fires with the call ID and recording URL.

  • Transcription & Summarization:

    • Use third-party services like 11 Labs Speech-to-Text or others.

    • Input the recording URL to transcribe the conversation.

    • Use AI to generate summaries or evaluate call quality.

4. Debugging and Audio Stitching

  • Debug Audio: Stores stitched-together audio for troubleshooting.

  • Limitations:

    • AI responses may be cut off if interrupted.

    • Latency in stitching can cause delays (~7-8 seconds), but actual conversation latency remains low (~2 seconds).

Triggering Calls and Multi-Channel Integration

1. Inbound Calls

  • The system can automatically handle inbound calls, triggering the AI flow based on caller input.

  • Webhooks: External systems can send data (e.g., form submissions) to initiate calls via inbound webhooks.

2. Outbound Calls and Notifications

  • Use "Make a Phone Call" actions in other channels (messenger, email, etc.) to trigger outbound calls.

  • Setup:

    • Select caller and recipient numbers.

    • Attach AI agent nodes for automated handling.

    • Provide optional payloads for context.

3. Notification Triggers

  • Trigger calls based on events, such as form submissions or scheduled notifications.

  • Configure start nodes with AI agent references for seamless handover.

Practical Demonstrations

1. Weather Inquiry Call

  • User: "Can you tell me the weather in Brisbane?"

  • AI: "The weather in Brisbane is sunny and 25°C."

  • Function triggers automatically, providing real-time data.

2. Call Transfer to Human Agent

  • User: "I want to talk to a human."

  • AI: "Transferring your call to a human agent."

  • Call is transferred with a whisper message for context.

3. Call Termination

  • User: "Thanks, that's all."

  • AI: "Thank you for your call. Goodbye."

  • Call ends automatically after evaluation.

Summary and Best Practices

  • Leverage AI Functions for dynamic, task-specific interactions.

  • Configure Voice Consistency with 11 Labs for a natural user experience.

  • Utilize Call Recordings for quality assurance and post-call analysis.

  • Integrate Multi-Channel Triggers for flexible automation.

  • Balance Latency and Quality when using third-party voice synthesis.

  • Test Thoroughly to ensure smooth transitions between AI responses, function calls, and transfers.

Final Thoughts

This advanced setup empowers organizations to create highly interactive, natural, and efficient voice automation systems.

By combining AI functions, custom voice synthesis, call recordings, and multi-channel triggers, users can deliver superior customer experiences while maintaining control over call flow and quality.

Proper configuration and testing are essential to maximize these capabilities, ensuring the voice AI system operates seamlessly in real-world scenarios.