Saturday, May 29, 2010

WEEK 6 (10-14 MAY)


This is my 6th week in Dare BPO. This week I continued my research on Text-to-Speech (TTS), Speech-to-Text and predictive dialing.

Firstly, I shall discuss about Text-to-Speech. TTS is the artificial production of human speech. TTS transforms any text into speech in real time. It literally reads out loud any written information with a smooth and natural sounding voice.The automatic intonation reflects the meaning of the text, with respect to pauses, breath groups, punctuation and context. The most important qualities of a speech synthesis system are naturalness and intelligibility. The computer system used to achieve this is called a speech engine. You can try using the TTS application at http://www2.research.att.com/~ttsweb/tts/demo.php

Example of TTS application

In order to reproduce the natural sound of each language, a narrator records a series of texts which contain every possible sound in the chosen language. These recordings are then sliced and organized into an acoustic database. During database creation, all recorded speech is segmented into some or all of the following: diphones, syllables, morphemes, words, phrases, and sentences. To reproduce words from a text, the TTS system begins by carrying out a sophisticated linguistic analysis that transposes written text into phonetic text. A grammatical and syntactic analysis then enables the system to define how to pronounce each word in order to reconstruct the sense. We call this the prosody. It gives the rhythm and intonation of a sentence. Finally, the system produces information associating the phonetic writing with the tone and required length of the pronunciation. The chain of analysis ends here and sound is generated by selecting the best units stocked in the acoustic database. 


A TTS capability for a computer refers to the ability to play back text in a spoken voice. The Text-to-Speech tab located in the Speech Control Panel presents the options for each TTS engine. Below are the steps for configuration: 

Step 1: Set Up Speakers.
Step 2: Select an Audio Output Device.
Step 3: Set Audio Output Device Options.
Step 4: Configure Text-to-Speech Options.

Next topic to discuss would be Speech-to-Text. Speech-to-Text (also known as automatic speech recognition or computer speech recognition) converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker as is the case for most desktop recognition software.

Speech recognition applications include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), domotic appliance control, search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g, a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input). 


Speech recognition fundamentally functions as a pipeline that converts PCM (Pulse Code Modulation) digital audio from a sound card into recognized speech. The elements of the pipeline are:
  1. Transform the PCM digital audio into a better acoustic representation.
  2. Apply a "grammar" so the speech recognizer knows what phonemes to expect. A grammar could be anything from a context-free grammar to full-blown Language.
  3. Figure out which phonemes are spoken.
  4.  Convert the phonemes into words.

When a person speaks, vibrations are created. The speech recognition technology converts these vibrations, for example analog signals into a digital form by means of an analog-to-digital converter (ADC). Digitization of sound takes place by its measurement at regular intervals. The sound is filtered into different frequency bands and normalized, so that it attains a constant volume level. It is checked whether the sound matches with the already stored sound templates. The next step in the speech recognition procedure, is dividing the analog signals into segments that range from a few hundredths to thousands of a second. These segments are matched with phonemes that are already stored in the system. Phonemes are specific sounds that are understood by people speaking a particular language.


Final topic to discuss is predictive dialing. Predictive dialing uses a computer-based system that automatically dials groups of telephone numbers, and then passes calls to available operators or agents in a calling center once the calls are connected. The most common use of predictive dialing is in call centers which make large amounts of calls, such as those run by telemarketing companies. 

Predictive dialing is far more advanced than using an autodialer because it monitors calls made to see how they are answered. If the call goes unanswered, is met with a busy signal or answering machine, or reaches a fax machine, the predictive dialer immediately ends the call. Only calls that are answered by a live person are put through to an operator. Therefore, productivity is increased because callers do not have to listen to unanswered calls or wait for someone to pick up. Predictive dialing is so named because it predicts when callers will become available to take a new call, and dials calls in advance. When a person answers the phone, predictive dialing puts the call through to an agent, although there is sometimes a brief delay as the predictive dialer attempts to determine whether the person's voice is a recording, in which case the call is ended.


In call centers and other applications where predictive dialer is employed, information pertaining to telephone numbers of people and businesses to be called is stored in a network server. All agents are linked to the server. The network is also linked to the predictive dialer, which can be either a hard dialer or a soft dialer. With agents at work, the server and or the dialer start dialing the numbers. The calls are then managed by the dialer. In case of silence at other end, the dialer will hang up. From the other calls, the dialer will screen out busy, unanswered, and answering machine calls. Only the live calls are put through to the agents. The instant agent gets connected to a call, all information pertaining to the call gets displayed on the agent's screen. 

No comments:

Post a Comment