Affordable Access

Access to the full text

Performance analysis of neural network, NMF and statistical approaches for speech enhancement

Authors
  • Kandagatla, Ravi Kumar1
  • Potluri, Venkata Subbaiah2
  • 1 Lakireddy Bali Reddy College of Engineering (Autonomous), Mylavaram, Krishna Dt, Andhra Pradesh, 521230, India , Mylavaram, Krishna Dt (India)
  • 2 Velagapudi Ramakrishna Siddhartha Engineering College (Autonomous), Kanuru, Vijayawada, Andhra Pradesh, 520007, India , Vijayawada (India)
Type
Published Article
Journal
International Journal of Speech Technology
Publisher
Springer US
Publication Date
Sep 17, 2020
Volume
23
Issue
4
Pages
917–937
Identifiers
DOI: 10.1007/s10772-020-09751-6
Source
Springer Nature
Keywords
License
Yellow

Abstract

Bayesian Estimators are very useful in speech enhancement and noise reduction. But, it is noted that the traditional estimators process only amplitudes and the phase is left unprocessed. Among the Bayesian estimators, Super- Gaussian based estimators provide improved noise reduction. Super-Gaussian Bayesian estimators, which uses processed phase information for estimation of amplitudes provides further improved results. In this work, the Complex speech coefficients given Uncertain Phase (CUP) based Bayesian estimators like CUP-GG (CUP Estimator with speech spectral coefficients assumed as Gamma and noise spectral coefficients as Generalized Gamma), CUP-NG (Speech as Nakagami) are compared under white noise, pink noise, Babble noise and Non-Stationary factory noise conditions. The statistical estimators show less effective results under completely non-stationary assumptions like non-stationary factory noise, babble noise etc. Non-negative Matrix Factorization (NMF) based algorithms show better performance for non stationary noises. The drawback of NMF is, it requires apriori knowledge about speech. This drawback can be overcome by taking the advantages of both statistical approaches and NMF approaches. NR-NMF and WR-NMF speech enhancement methods are developed by providing posteriori regularization based on statistical assumption of speech and noise DFT coefficients distribution. Also a speech enhancement method which uses CUP-GG estimator and NMF with online noise bases update are considered for comparison. The progress in neural network based approaches for speech enhancement further shown that with large dataset and better training, the speech enhancement algorithms results in improved results. In this work, the neural network approach for speech enhancement is implemented and compared the method with traditional estimators and NMF approaches. For generalization of unseen noise types the proposed neural network approach uses dropout. Also for training the network, the features obtained from apriori SNR and aposteriori SNR is used in this method. The objective of this paper is to analyze the performance of speech enhancement methods based on Neural Network, NMF and statistical based. The objective performance measures Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), Signal to Noise Ratio (SNR), Segmental SNR (Seg SNR) are considered for comparison.

Report this publication

Statistics

Seen <100 times