Affordable Access

Access to the full text

Text-independent speaker identification based on deep Gaussian correlation supervector

Authors
  • Sun, Linhui1, 2
  • Gu, Ting1
  • Xie, Keli1
  • Chen, Jia1
  • 1 Nanjing University of Posts and Telecommunications, College of Telecommunications & Information Engineering, Nanjing, China , Nanjing (China)
  • 2 Ministry of Education, Nanjing University of Posts and Telecommunications, Key Lab of Broadband Wireless Communication and Sensor Network Technology, Nanjing, China , Nanjing (China)
Type
Published Article
Journal
International Journal of Speech Technology
Publisher
Springer US
Publication Date
May 03, 2019
Volume
22
Issue
2
Pages
449–457
Identifiers
DOI: 10.1007/s10772-019-09618-5
Source
Springer Nature
Keywords
License
Yellow

Abstract

Great progress has been made in speaker recognition by extracting features from Gaussian mixture model (GMM) or deep neural network (DNN). In this paper, to extract the personality characteristics of speakers more accurately, we propose a novel deep Gaussian correlation supervector (DGCS) feature based on a DBN-GMM hybrid model. In the method, we firstly extract MFCC from preprocessed speech signals and employ a DBN to gain bottleneck features. Then bottleneck features are fed to a GMM to extract deep Gaussian supervector (DGS) which can be as the input of SVM achieving pattern discrimination and judgment. Further considering the relevance between deep mean vectors of DGS, DGS will be transformed to DGCS by the method of supervector recombination. Our experiments show that utilizing DGCS can significantly improve recognition rate by 17.979% compared to the system only with supervector, 18.22% compared to the system with DGS and 1.875% compared to the system with correlation supervector. In addition, the proposed DGCS demonstrates that time complexity for identification task can be largely reduced.

Report this publication

Statistics

Seen <100 times