Essential Components to Consider When Introducing a Speech Solution

Wayne Ramprashad February 24, 2021

Introducing a speech analytics solution can give your contact center a competitive edge, and, when you’re ready to start planning that solution, automated speech recognition (ASR) technology is the most common place to start. However, ASR is far from the only component of a complete speech solution. Audio, call orchestration, call metadata, and voice analysis are also vital pieces of the puzzle. To help you prepare for implementing speech analytics in your contact center, let’s take a closer look at what will require your attention.


To access audio, contact centers can either get call recordings out of a call recorder or tap directly into the telephony system. While getting recordings involves less development, it limits you to post-call insights (rather than real-time). Additionally, call recorder companies charge expensive fees for audio access. Once you get it, the recording is often low quality, decreasing the insights it contains.

On the flip side, tapping into a telephony system is a higher effort undertaking. However, you can gain quality audio for more advanced analytics, avoid recording vendor fees, and enable direct-to-transcript (DTT) solutions that give agents and managers real-time guidance and insights as calls occur. DTT solutions can also simplify your entire speech infrastructure – streamlining call access, call capture, call orchestration, and speech-to-text into one united process and then sending the transcripts straight to your analytics application.


Call orchestration is the process of queuing and ingesting audio into an ASR engine. This process is unique for every system, and any post-call transcription will require creating custom code to send audio to the ASR engine via an API. One common method for post-call audio ingestion is to write audio files to a directory and create software that uploads new files added to that directory to the ASR engine. Alternatively, you may connect to the call recording system API to extract and send the audio to the ASR engine.

Automated Speech Recognition

Contact center ASR technology has come a long way in recent years. Today, best-in-class ASR technology combines low latency and fast transcription speeds with robust features and flexible deployment and integration options. For product completeness, ASR technology should offer real-time and post-call transcription with punctuation, capitalization, number formatting, metadata, security features, multiple languages supported, and custom tuning. Tuning should be used to increase word recognition and accuracy, particularly for product, brand, and industry terms. Tuning can also help you better assess call and agent sentiment and adjust what is redacted for compliance and security. To reduce your time spent on custom tuning, some ASR engines offer built-in language models for specific industries and applications.

When comparing ASR engines, look for a vendor with a record of improving speech-to-text accuracy, lowering costs, adding new language models, and enhancing ease of use over time. 

Call Metadata

Creating a voice-of-the-customer speech analytics tool requires utilizing call metadata such as agent name, agent team, and caller number. (This metadata is distinct from any metadata that your ASR engine provides on things like gender, sentiment, and emotion, for example.) To link call metadata with call transcripts, contact centers must first determine how to extract call metadata from the CTI system. One possibility is that your ASR engine or analytics software will offer a solution that meets your needs and works with your CTI system. If not, you will have to build a custom ETL system, using an internal database to relate each call recording or transcript with its appropriate metadata.

Voice Analysis

Some ASR engines provide a call analyzer for organizing, filtering, searching, classifying, visualizing, and reporting on call data. Using a call analyzer, you can search transcripts for specific words or phrases as well as automatically tag calls based on designated criteria. You can also view statistics and trends for metrics such as call volume, duration, in-call silence, and agent or client emotions.

Whether or not your ASR system features a call analyzer, you will need to connect the ASR engine to your analytics or business intelligence application via an API. As long as your ASR engine is designed for ease of integration, this process should be straightforward.

A Complete Solution for Delivering Actionable Insights

While a call analyzer is useful, leading contact centers do more than review dashboards and metrics. The greatest opportunity lies in using call transcripts for predictive and prescriptive analytics. These advanced analytics applications allow you to anticipate customer needs and recommend next best actions to drive better customer experiences, smarter business decisions, and significant cost savings. For more information about what it takes to create a speech solution that helps you achieve your goals, review our full white paper on Building a Speech Solution for the Contact Center and get in touch to see how our experts can advise you on your entire speech solution infrastructure!

Wayne Ramprashad

Wayne has strong expertise in the architecture, development, and operations of large-scale speech-driven data systems. Prior to joining Voci in 2016, he held senior positions at several Fortune 500 companies, including Comcast, IBM, Lucent, and AOL Time Warner. At Comcast, Wayne directed an enterprise-wide strategy for self-service solutions in support of customer service operations. He earned a B.S. degree in Joint Applied Mathematics and Computer Science from the University of Waterloo, Canada.

Access our ASR API

With up to 1000 hours of audio at no charge