TTS - Text to Speech

TTS - Text to Speech

Overview

This section contains the blocks that can be used for text-to-speech conversion.

You can select your preferred provider from the list of those integrated with XCALLY, that you can find in the next paragraphs.

Important

Please note that TTS providers are third-party applications. Their functionalities, costs, and behaviors depend on the specific provider you choose.

Requirements for TTS blocks to function properly:

  • A stable internet connection.

  • A valid account with the selected TTS provider.

 

Google Cloud TTS

image-20240530-153802.png

Explore this documentation to find out How to retrieve Google Key for Cally Square blocks

The Google Cloud TTS block allows to perform Text-to-Speech conversion using Google Cloud TTS.

Fields

  • Label: enter a short description for the block

  • Provider: select the Google Cloud Provider account from the dropdown list

  • Text Type: select Text (PlainText) or SSML (Speech Synthesis Markup Language: refer to Google official documentation to find out more about how to use it)

  • Text: enter the text you would like to convert to speech using TTS

  • Language Code: the language you would like to use

  • Voice Type: select one of the proposed types like News, Standard, Studio, Polyglot

  • Voice Name: select the voice of the operator, choose between Female/Male and voice type

  • Speaker Type: select the device profile like smartphones, headphones, car speakers.

  • Speed: define the speed at which the TTS engine reads the text

  • Pitch: adjust the intonation of the generated voice

Exit Arrows

The Google Cloud TTS block provides just one arrow out to the next step

 

Google TTS

image-20240530-154233.png

The Google TTS block allows to perform Text-to-Speech conversion using Google TTS, for internal testing.

Fields

  • Label: enter a short description for the block

  • Text: enter the text you would like to convert to speech using TTS. The maximum text length allowed is 200 characters.

  • Language: select the language you would like to use from the dropdown list

Exit Arrows

The Google TTS block provides just one arrow out to the next step

We do not recommend using this block in a production environment, as it is intended for testing purposes only.

For production, we recommend using Google Cloud TTS.

 

iSpeech TTS

image-20240611-152333.png

The iSpeech block allows to perform Text-to-Speech conversion using the iSpeech TTS Agi Parameters

Fields

  • Label: enter a short description for the block

  • Text: enter the text you would like to convert to speech using TTS

  • Key: insert your license key from the ispeech.org account

  • Language: the language you would like to use for the translation, select it from the dropdown list

  • Speed: define the speed at which the TTS engine reads the text

  • Interrupt key: set a a key or digit that, when pressed by the user, can interrupt the ongoing speech playback.

Exit Arrows

The iSpeech block provides just one arrow out to the next step.

AWS Polly

image-20240611-152405.png

The AWS Polly block allows to perform Text-to-Speech conversion using AWS Polly Agi Parameters.

Fields

  • Label: enter a short description for the block

  • Access Key ID and Secret Access Key: insert AWS security credentials. See AWS Polly documentation

  • Region: select the AWS regional endpoint. See AWS Polly documentation

  • Voice: select the voice used for the synthesis, from the dropdown list

  • Text: enter the text you would like to convert to speech using TTS

  • Text Type: specifies whether the input text is plain text or SSML. The default value is plain text. See AWS Polly documentation

Exit Arrows

The AWS Polly block provides just one arrow out to the next step

OpenAI TTS

FROM VERSION 3.49.0

image-20250331-071154.png
image-20250328-131237.png

 

The OpenAI TTS block allows to perform Text-to-Speech conversion using OpenAI TTS.

Fields

  • Label: enter a short description for the block

  • OpenAI Cloud Provider previously configured that you want to use

  • Model of TTS to use between TTS-1 and TTS-1 High Definition

  • Voice Name: select the voice of the operator that will speak the output

  • Text: enter the text you would like to convert to speech using TTS

To discover more about text to speech with OpenAI, you can explore official documentation
Please note the OpenAI output is MP3.

Exit Arrows

The OpenAI TTS block provides just one arrow out to the next step

Lumenvox TTS

image-20240611-152454.png

Important: you must install Lumenvox on a machine that is reachable by your system.

 

The Lumenvox TTS block allows to perform Text-to-Speech conversion using Lumenvox TTS

Fields

  • Label: enter a short description for the block

  • Text: enter the text you would like to convert to speech using TTS

  • Options: here you can define details about the synthesis. Valid options are:

    • l - language to use (e.g. "en-GB", "en-US", "en-AU", etc.)

    • v - voice name to use (e.g. "Lindsey", "Chris", etc.)

    • g - voice gender to use (e.g. "male", "female")

    • p - profile to use, as specified in the mrcp.conf file

    • i - digits to allow the TTS to be interrupted with (can specify "any" to allow any digits to interrupt)

    • f - filename on disk to store audio to (audio not stored if not specified or empty)

    • epe – exit on a play error

    • pv - prosody volume (silent/x-soft/soft/medium/load/x-loud/default)

    • pr - prosody rate (x-slow/slow/medium/fast/x-fast/default)

Multiple options can be provided by joining options with an ampersand, e.g. l=en-US&g=female

Exit Arrows

The Lumenvox TTS block provides just one arrow out to the next step

Sestek TTS

image-20240611-152525.png

 

The Sestek TTS block allows to perform Text-to-Speech conversion using Sestek TTS.

Fields

  • Label: enter a short description for the block

  • Text: enter the text you would like to convert to speech using TTS

  • Options: they control details about the synthesis. Valid options are:

    • l - language to use (e.g. "en-GB", "en-US", "en-AU", etc.)

    • v - voice name to use (e.g. "Lindsey", "Chris", etc.)

    • g - voice gender to use (e.g. "male", "female")

    • p - profile to use, as specified in the mrcp.conf file

    • i - digits to allow the TTS to be interrupted with (can specify "any" to allow any digits to interrupt)

    • f - filename on disk to store audio to (audio not stored if not specified or empty)

    • epe – exit on a play error

    • pv - prosody volume (silent/x-soft/soft/medium/load/x-loud/default)

    • pr - prosody rate (x-slow/slow/medium/fast/x-fast/default)

Multiple options can be provided by joining options with an ampersand, e.g. l=en-US&g=female

Exit Arrows

The Sestek TTS block provides just one arrow out to the next step

UniMRCP Synth

image-20240611-152550.png

To make this block work you must install MRCP Synth on a machine that is reachable by your system.

How to install MRCP

The Media Resource Control Protocol (MRCP) is a network protocol based on the client/server model. MRCP allows client applications to control media service resources residing in servers.

  • Run the MRCP installation script

curl -u 'public:bs4#)W]h8+VK),RV' --silent --location https://repository.xcally.com/repository/provisioning/Scripts/ast_mrcp_install | bash

The script automatically executes everything needed.

  • Restart Asterisk to load the MRCP modules.

The UniMRCP Synth block lets you perform a Text-To-Speech conversion based on MRCP.

Fields

  • Label: enter a short description for the block

  • Text: enter the text you would like to convert to speech using TTS

  • Options: they control details about the synthesis. Valid options are:

    • l - language to use (e.g. "en-GB", "en-US", "en-AU", etc.)

    • v - voice name to use (e.g. "Lindsey", "Chris", etc.)

    • g - voice gender to use (e.g. "male", "female")

    • p - profile to use, as specified in the mrcp.conf file

    • i - digits to allow the TTS to be interrupted with (can specify "any" to allow any digits to interrupt)

    • f - filename on disk to store audio to (audio not stored if not specified or empty)

    • epe – exit on a play error

    • pv - prosody volume (silent/x-soft/soft/medium/load/x-loud/default)

    • pr - prosody rate (x-slow/slow/medium/fast/x-fast/default)

Multiple options can be provided by joining options with an ampersand, e.g. l=en-US&g=female

Exit Arrows

The MRCP TTS block provides just one arrow out to the next step

XCALLY Motion can be integrated with any ASR/TTS system based on MRCP. In this section you find an example of this kind of integration (Lumenvox).

This configuration activity is recommended for advanced users only (system engineers)

Configuring UniMRCP to work with LumenVox

UniMRCP is a pre-requisite of this integration: if it is not installed yet, see on the previous section how to do it.

You need to configure:

  1. The mrcp.conf file in /etc/asterisk/

  2. The res-speech-unimrcp.conf file in /etc/asterisk/

  3. The unimrcpclient.xml in /usr/local/unimrcp/conf/

  4. The lumenvox.xml in /usr/local/unimrcp/conf/client-profiles/

Step 1: Configure res_unimrcp.so module by editing the mrcp.conf file

Here you can find a working mrcp.conf example (to be modified accordingly with your environment):

[general]

default-asr-profile = speech-lumenvox-mrcp1

default-tts-profile = speech-lumenvox-mrcp1

 

; UniMRCP logging level to appear in Asterisk logs.  Options are:

; EMERGENCY|ALERT|CRITICAL|ERROR|WARNING|NOTICE|INFO|DEBUG -->

log-level = DEBUG

max-connection-count = 100

offer-new-connection = 1

; rx-buffer-size = 1024

; tx-buffer-size = 1024

; request-timeout = 60

 

[speech-lumenvox-mrcp1]

; +++ MRCP settings +++

version = 1

;

; +++ RTSP +++

; === RSTP settings ===

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; Set this to the LumenVox Media Server's IP:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

server-ip = X.X.X.X

server-port = 554

; force-destination = 1

resource-location = media

speechsynth = speechsynthesizer

speechrecog = speechrecognizer

;

; +++ RTP +++

; === RTP factory ===

 

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; Set this to the XCALLY Motion machine's IP:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

rtp-ip = X.X.X.X

; rtp-ext-ip = auto

rtp-port-min = 4000

rtp-port-max = 5000

; === RTP settings ===

; --- Jitter buffer settings ---res-speech-unimrcp.conf

playout-delay = 50

; min-playout-delay = 20

max-playout-delay = 200

; --- RTP settings ---

ptime = 20

codecs = PCMU PCMA L16/96/8000 telephone-event/101/8000

; --- RTCP settings ---

rtcp = 1

rtcp-bye = 2

rtcp-tx-interval = 5000

rtcp-rx-resolution = 1000

 

  1. Set the server-ip directive pointing to the Lumenvox Media Server IP.

  2. Set the server-port directive to the Lumenvox Media Server port.

  3. Set the rtp-ip directive pointing to the XCALLY Motion IP.

Step2: Configure res_speech_unimrcp.so modules by editing the res-speech-unimrcp.conf file

Here you can find a working res-speech-unimrcp.conf example:

[general]

; UniMRCP named profile. Options are:

;unimrcp-profile = uni2      ; UniMRCP MRCPv2 Server

;unimrcp-profile = uni1     ; UniMRCP MRCPv1 Server

;unimrcp-profile = lv2      ; LumenVox MRCPv2 Server

unimrcp-profile = lv1      ; LumenVox MRCPv1 Server

;unimrcp-profile = nss2     ; Nuance MRCPv2 Server

;unimrcp-profile = nss1     ; Nuance MRCPv1 Server

 

; UniMRCP logging level.  Options are:

; EMERGENCY|ALERT|CRITICAL|ERROR|WARNING|NOTICE|INFO|DEBUG -->

log-level = DEBUG

 

; Preloaded grammars

[grammars]

;grammar-name = path-to-grammar-file

 

; MRCPv2 properties (recognizer and generic header fields)

; http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-20#section-9.4

[mrcpv2-properties]

Recognition-Timeout = 20000

No-Input-Timeout = 15000

 

; MRCPv1 properties (recognizer and generic header fields)

; http://tools.ietf.org/html/rfc4463#section-8.4

[mrcpv1-properties]

Recognition-Timeout = 20000

No-Input-Timeout = 15000

 

Set the unimrcp-profile directive to lv1 (LumenVox MRCPv1 Server).

 

Step 3: Configure the MRCP client profile by editing the lumenvox.xml file

Here you can find a working lumenvox.xml example (to be modified accordingly with your environment):

 

<?xml version="1.0" encoding="UTF-8"?>

<!-- UniMRCP client document -->

<unimrcpclient xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 

               xsi:noNamespaceSchemaLocation="../unimrcpclient.xsd" 

               version="1.0">

<settings>

    <!-- SIP MRCPv2 settings -->

    <sip-settings id="LumenVox-SIP-Settings">

      <!--

    Server IP address can explicitly be specified per "sip-settings". Otherwise, the server IP

        address defaults to "server-ip" set in the properties, which in turn defaults to "ip".

  -->

      <!-- <server-ip>10.10.0.1</server-ip> -->

      <server-port>5060</server-port>

      <!-- <force-destination>true</force-destination> -->

    </sip-settings>

    

    <!-- RTSP MRCPv1 settings -->

    <rtsp-settings id="LumenVox-RTSP-Settings">

      <!--

    Server IP address can explicitly be specified per "rtsp-settings". Otherwise, the server IP

        address defaults to "server-ip" set in the properties, which in turn defaults to "ip".

  -->

      <!-- <server-ip>10.10.0.1</server-ip> -->

      <server-ip>X.X.X.X</server-ip>

      <server-port>554</server-port>

      <!-- <force-destination>true</force-destination> -->

      <resource-location></resource-location>

      <resource-map>

        <param name="speechrecog" value="recognizer"/>

      </resource-map>

    </rtsp-settings>

  </settings>

  

  <profiles>

    <!-- LumenVox MRCPv2 profile -->

    <mrcpv2-profile id="lv2">

      <sip-uac>SIP-Agent-1</sip-uac>

      <mrcpv2-uac>MRCPv2-Agent-1</mrcpv2-uac>

      <media-engine>Media-Engine-1</media-engine>

      <rtp-factory>RTP-Factory-1</rtp-factory>

      <sip-settings>LumenVox-SIP-Settings</sip-settings>

      <rtp-settings>RTP-Settings-1</rtp-settings>

    </mrcpv2-profile>

    

    <!-- LumenVox MRCPv1 profile -->

    <mrcpv1-profile id="lv1">

      <rtsp-uac>RTSP-Agent-1</rtsp-uac>

      <media-engine>Media-Engine-1</media-engine>

      <rtp-factory>RTP-Factory-1</rtp-factory>

      <rtsp-settings>LumenVox-RTSP-Settings</rtsp-settings>

      <rtp-settings>RTP-Settings-1</rtp-settings>

    </mrcpv1-profile>

 

    <!-- More profiles may follow. -->

  </profiles>

</unimrcpclient>

 

  1. Set the <server-ip>X.X.X.X</server-ip> sections pointing to the Lumenvox Media Server IP.

  2. Set the <server-port>554</server-port> to the Lumenvox Media Server port.

Step 4: Configure unimrcpclient by editing the unimrcpclient.xml file

Here you can find a working unimrcpclient.xml example (to be modified accordingly with your environment):

<?xml version="1.0" encoding="UTF-8"?>

<!-- UniMRCP client document -->

<unimrcpclient xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 

xsi:noNamespaceSchemaLocation="unimrcpclient.xsd" 

               version="1.0"

               subfolder="client-profiles">

  <properties>

    <!--

      If the attribute "type" is set to "auto", IP address is determined implicitly by the hostname.

      This is the default setting.

    -->

    <!-- <ip type="auto"/> -->

 

    <!--

      If the attribute "type" is set to "iface", IP address is determined by the specified name of

      network interface/adapter.

    -->

    <!-- <ip type="iface">eth0</ip>-->

 

    <!--

      IP address can also be specified explicitly.

    -->

    <ip>127.0.0.1</ip>

    

    <!-- <ext-ip>a.b.c.d</ext-ip> -->

 

    <!--

      Server IP address should be specified explicitly, unless the client and the server are located on

      the same host. The server IP address can also be specified per <sip-settings> and <rtsp-settings>.

    -->

    <!-- <server-ip>a.b.c.d</server-ip> -->

  </properties>

 

  <components>

    <!-- Factory of MRCP resources -->

    <resource-factory>

      <resource id="speechsynth" enable="true"/>

      <resource id="speechrecog" enable="true"/>

      <resource id="recorder" enable="true"/>

      <resource id="speakverify" enable="true"/>

    </resource-factory>

 

    <!-- SofiaSIP MRCPv2 signaling agent -->

    <sip-uac id="SIP-Agent-1" type="SofiaSIP">

      <!--

        By default, "ip" and "ext-ip" addresses, set in the properties, are used. These parameters can

        explicitly be specified per "sip-uas" by means of "sip-ip" and "sip-ext-ip" correspondingly.

      -->

      <!-- <sip-ip>10.10.0.1</sip-ip> -->