Like any high-tech field, STT (Speech-To-Text) shares many terms with the English language but gives them its own special twist. Take transcription, for example, which basically means writing something down. Writing something down is not that exciting a topic, until you consider that transcription in an ASR (Automatic Speech Recognition) system means writing down sounds after they have been automatically built up into words. The system processes input sound as it is spoken. Each uninterrupted chain of sounds that can be separately identified and analyzed (an utterance) provides one or more words. The words have individual meanings, but can be more accurately selected and understood when combined in context with the other words that are being built up and transcribed.
Understanding words in context is one of the things that you get from a language model used during speech recognition. A language model is a set of complex data structures that identify common terms and expressions used within the domain that is the target of the model, like a certain industry, and identify their relationships and the probability of those relationships. The words used in a language model provide the basis of a dictionary for that model, which is a set of terms and phrases that is more likely to be encountered in the domain addressed by a language model. Language models also identify the amount of space between words or phrases for them to be related, and so on. The combination of being able to identify words from the sounds in voice input that is being processed, being able to identify other words that provide a context for the use of those words, and to be able to identify those words from that context is one of the core aspects of the AI (artificial intelligence) that supports Voci products such as V-Blaze.
With the basic definitions of what and how covered, the next few blogs will venture into additional exploration of the technology that supports ASR systems, and specific examples of how Voci ASR can benefit certain applications and industries. These scenarios include transcribing and providing insights into customer service calls, generating subtitles on audio and video content, and much more. Talk to you soon!