Voci is the ASR Engine for
Contact Center Solutions
Built specifically for the scale of the contact center, Voci’s modern, future-proof ASR engine rewrites the rules of what's possible for contact center solutions.
Industry-leading speed, efficiency, and time to results.
Leading out-of-the-box, and can be tuned for any business or industry.
Open and Flexible
Numerous native integrations, and compatible with virtually any tech stack.
Auto-formatting, speaker separation, gender, emotion, sentiment, and more.
Safe and Secure
PCI DSS compatible automatic redaction of sensitive information.
Both in-cloud and on-premises deployments available.
One Giant Leap Forward for Your Solution
Low Infrastructure Costs
Under the Hood:
- State-of-the-art NVIDIA® GPUs that increase performance and (STT) conversion speed
- Optimized cloud solutions powered by AWS
- Latest-generation Intel® Xeon® or AMD processors
- High-speed DDR4 SDRAM for high-bandwidth data transfers
- Containerized machines for VMware, AWS AMIs, and others
Adds automatic punctuation and capitalization in the output transcription file.
Formats text into to a human readable numeric format i.e. "twelve thirty" to "12:30" - localized for a given language
Allows a user to upload most formats of audio directly into the engine natively
Callbacks are used to enable another application to receive and directly interact with the produced transcripts. Allows for automated Production workflows for Speech transcription.
Speaker Separation [Diarization]
Automatic speaker separation of customer and agent voices when both are recorded on one channel, enabling their utterances to be analyzed independently.
Classifies & trends emotion (over time) based on acoustic features for a given call/utterance/audio file.
Uses a combination of acoustic emotion and text-based sentiment scores to determine if a given utterance is Positive, Improving, Neutral, Worsening, or Negative.
Classifies sentiment based on the text of the call/utterance with negative, mostly negative, neutral, mostly positive, or positive.
Scores words, utterances, and calls with the system's confidence in the transcription results.
Automatically predicts and tags the incoming languge being spoken, and utilizes said language model for duration of call./audio file.
Acoustic AI model that predicts the estimated age of a given speaker.
Predictive model to identify which audio channel(speaker) is the Agent (vs the customer).
Classifies a given utterance to be = music or not where music/hold time will not be sent to the engine for transcription.
Silence and Overtalk
Percentage of overtalk that occured for a given call or audio file.
Credit Card Detection
Adds a tag to the transcript predicting numbers that are Credit Cards (even if it was redacted).
Call Metrics related to the total number of words spoken, speaker turns, speech time, and number of substitutions per call and per speaker.