When It Has to Be Right: Transcription Accuracy and Tuning Speech Recognition

Lynwood Hines April 02, 2019

So, you’ve bought yourself a new ASR solution. You did your research, you found the system that works best for your business, you researched the vendor, and finally you took the plunge. You’re sure that this is going to work perfectly, and all you need to do is turn it on and start feeding it call audio.

It would be nice if ASR worked like that. But, there is no out-of-the-box solution that can handle all terms in all conversations, let alone terms that are specific to particular customers, particular industries, or particular locations. Just consider the issue of product names, which frequently use unusual terms or acronyms that aren’t used as often outside of those businesses (e.g., “KFC” or “Reebok”). And then there are problems with accents and pronunciation errors. Really, there are a lot of factors that can make your ASR solution less than optimally accurate.

Tuning helps enhance accuracy by identifying company and industry specific language. This means that sample audio data is given to the ASR algorithms, in order to train the system to produce the correct words and phrases. In principle, the idea is to give the ASR solution enough additional custom information that it can reach the level of accuracy needed for your business.

There are a few basic methods available for tuning an ASR system, including substitution and custom language modelling. These methods for improving ASR accuracy can directly benefit your business, your customers, and the bottom line. But there’s more to it than we can cover in a blog. Click here to access our whitepaper and learn everything you need to know about tuning your ASR system.

Lynwood Hines

Lynwood has over 30 years of experience designing and developing software and electronic systems.  He has been providing software development and customer support / integration services at Voci for the past 5 years.  He is currently responsible for creating custom language models and software to automate the language modeling process.

Access our ASR API

With up to 1000 hours of audio at no charge