Word error rate measures the accuracy of speech recognition by calculating the percentage of words incorrectly transcribed. It is the standard metric for evaluating ASR system performance.
How is WER calculated?
WER compares the ASR output to a reference transcript, counting substitutions (wrong words), insertions (extra words), and deletions (missing words). The formula is: (Substitutions + Insertions + Deletions) / Total Words in Reference. Lower WER indicates better accuracy.
Why does WER matter?
ASR accuracy directly impacts everything downstream. High WER means the AI frequently misunderstands callers, leading to errors, frustration, and failed interactions. Monitoring WER helps identify problems with specific accents, acoustic conditions, or vocabulary that may need attention.
WER in practice
A voice AI platform monitors WER across different call segments. Analysis reveals 5% WER overall, but 15% WER for calls originating from a specific region. Investigation shows the ASR struggles with a regional accent. They collect sample data and fine-tune the model, reducing regional WER to 6% and improving caller experience.