The effect of dialect mismatch on likelihood-maximising speech enhancement for noise-robust speech recognition

Affordable Access

The effect of dialect mismatch on likelihood-maximising speech enhancement for noise-robust speech recognition

Authors
Publisher
The Australasian Speech Science & Technology Association
Keywords
  • Speech Recognition
  • Speech Enhancement
  • 090609 Signal Processing
  • Optimization Methods
  • Accent Mismatch

Abstract

Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.

There are no comments yet on this publication. Be the first to share your thoughts.