How to Make Speech Recognition Software That Works — and Keeps Working

Don Reaves May 28, 2019

In my experience, no significant software project is ever really "done." There's always something that can be improved, and customer requirements are always changing.

In addition, as new technologies emerge, it makes sense to take advantage of them. If you don't, your competitors will, and that puts you at a disadvantage. Markets change also, so the software that was fine last year might seem antiquated now. It will need to be updated.

V-Blaze, for example, was built specifically to adapt to this kind of change. The original version used a custom FPGA (Field Programmable Gate Array) at its core. This was state of the art at the time, and enabled Voci to create a product with unparalleled speed. Before long, GPU (Graphical Processing Unit) technology was adapted for more general computational use. This eventually made it possible to provide even greater speed with the flexibility to enable better transcription accuracy.

The software that comprises V-Blaze was designed to be modular and scalable. Thanks to this design, it was possible to replace the underlying hardware platform using multiple GPUs, ensuring that Voci's products would continue to track the advances in technology.

Transcribing audio into text isn't all that V-Blaze does. There are a host of additional processing steps that produce a final transcript for a customer. One example is interpreting number words as digit strings in a format that's easy to recognize. A customer support person might say "call four one two six two one nine three one oh." V-Blaze would recognize that there are ten number words that should be assembled into a telephone number, 412-621-9310 (Voci's main number).

Surprisingly, this is not as simple as it might seem. People often use words like "oh" that really are number words when taken in context. They might also pause in unusual places, making it necessary for the software to account for the spacing by formatting the output as only one (or perhaps two) separate numbers.

Being based in the US, it was natural for Voci to develop this "number translation" software for English first. But it was obvious from the start that it would eventually be extended to cover other languages. So the software was designed to be rule-based in order to be easily adaptable. As long as a person knowledgeable about another language could provide the rules, a “translator” for another language could be built in relatively little time.

Using a rule-based system provided other advantages. V-Blaze allows customers to provide rules that can produce company and product names specific to their own business. Language-specific punctuation and capitalization is another example. When a new application arises, the necessary software can be written and plugged directly into V-Blaze with a minimum of effort.

So although no software project is ever really complete, using the right organization makes it possible to react to the market and keep products current, even as technology and requirements change.

Don Reaves

Don spent the last 40 years building tools for both customers and peers. He provided much of the "glue" in Voci products for the past 8 years. He hopes to spend his retirement pursuing his passion, sailing, as well as traveling the world with his wife, Celia.

Stay updated with Voci's speech insights