Voci is the ASR Engine for
Contact Center Solutions

Built specifically for the scale of the contact center, Voci’s modern, future-proof ASR engine rewrites the rules of what's possible for contact center solutions.

Ligtning Fast

Industry-leading speed, efficiency, and time to results.

Highly Accurate

Leading out-of-the-box, and can be tuned for any business or industry.

Open and Flexible

Numerous native integrations, and compatible with virtually any tech stack.

Smart Transcription

Auto-formatting, speaker separation, gender, emotion, sentiment, and more.

Safe and Secure

PCI DSS compatible automatic redaction of sensitive information.

Deployment Options

Both in-cloud and on-premises deployments available.

One Giant Leap Forward for Your Solution

Leading Accuracy

Enjoy leading out-of-the-box transcription accuracy, and further tune Voci's advanced speech models to your specific use case.

Data-rich Transcripts

Receive data-rich, auto-secure, searchable transcripts for every single contact center call, hugely increasing insights capability.

Low Infrastructure Costs

Switch on the most efficient transcription in existence, and see massively reduced hardware needs, datacenter space, and cost of operations.

Under the Hood:

State-of-the-art NVIDIA® GPUs that increase performance and (STT) conversion speed
Optimized cloud solutions powered by AWS
Latest-generation Intel® Xeon® or AMD processors
High-speed DDR4 SDRAM for high-bandwidth data transfers
Containerized machines for VMware, AWS AMIs, and others

Download Overview

Transcription Features:

Auto Punctuation

Adds automatic punctuation and capitalization in the output transcription file.

Number Formatting

Formats text into to a human readable numeric format i.e. "twelve thirty" to "12:30" - localized for a given language

Transcoding

Allows a user to upload most formats of audio directly into the engine natively

Call Backs

Callbacks are used to enable another application to receive and directly interact with the produced transcripts. Allows for automated Production workflows for Speech transcription.

Speaker Separation [Diarization]

Automatic speaker separation of customer and agent voices when both are recorded on one channel, enabling their utterances to be analyzed independently.

Acoustic Emotion

Classifies & trends emotion (over time) based on acoustic features for a given call/utterance/audio file.

Emotional Intellegence

Uses a combination of acoustic emotion and text-based sentiment scores to determine if a given utterance is Positive, Improving, Neutral, Worsening, or Negative.

Sentiment Analysis

Classifies sentiment based on the text of the call/utterance with negative, mostly negative, neutral, mostly positive, or positive.

Confidence Scores

Scores words, utterances, and calls with the system's confidence in the transcription results.

Language Identification

Automatically predicts and tags the incoming languge being spoken, and utilizes said language model for duration of call./audio file.

Age ID

Acoustic AI model that predicts the estimated age of a given speaker.

Agent ID

Predictive model to identify which audio channel(speaker) is the Agent (vs the customer).

Music Detection

Classifies a given utterance to be = music or not where music/hold time will not be sent to the engine for transcription.

Silence and Overtalk

Percentage of overtalk that occured for a given call or audio file.

Credit Card Detection

Adds a tag to the transcript predicting numbers that are Credit Cards (even if it was redacted).

Call Analysis

Call Metrics related to the total number of words spoken, speaker turns, speech time, and number of substitutions per call and per speaker.

Ready to see the world’s most efficient ASR in action?

Access our API or request a demo now.

Access API

Request Demo

Voci is the ASR Engine for Contact Center Solutions