Affordable Access

Speaker detection in the wild: Lessons learned from JSALT 2019

  • García, Paola
  • Villalba, Jesus
  • Bredin, Hervé
  • Du, Jun
  • Castan, Diego
  • Cristia, Alejandrina
  • Bullock, Latane
  • Guo, Ling
  • Okabe, Koji
  • Nidadavolu, Phani Sankar
  • Kataria, Saurabh
  • Chen, Sizhu
  • Galmant, Léo
  • Lavechin, Marvin
  • Sun, Lei
  • Gill, Marie-Philippe
  • Ben-Yair, Bar
  • Abdoli, Sajjad
  • Wang, Xin
  • Bouaziz, Wassim
  • And 4 more
Publication Date
Dec 20, 2019
External links


This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios. The main focus was to tackle a wide range of conditions that go from meetings to wild speech. We describe the research threads we explored and a set of modules that was successful for these scenarios. The ultimate goal was to explore speaker detection; but our first finding was that an effective diarization improves detection, and not having a diarization stage impoverishes the performance. All the different configurations of our research agree on this fact and follow a main backbone that includes diarization as a previous stage. With this backbone, we analyzed the following problems: voice activity detection, how to deal with noisy signals, domain mismatch, how to improve the clustering; and the overall impact of previous stages in the final speaker detection. In this paper, we show partial results for speaker diarizarion to have a better understanding of the problem and we present the final results for speaker detection.

Report this publication


Seen <100 times