Affordable Access

Access to the full text

NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain

  • Thakallapalli, Sushmita1
  • Gangashetty, Suryakanth V.1, 2
  • Madhu, Nilesh3
  • 1 Speech Processing Laboratory, International Institute of Information Technology, Hyderabad, India , Hyderabad (India)
  • 2 Present address: K L University, Guntur, Andhra Pradesh, India , Guntur (India)
  • 3 IDLab, Dept. Electronics & Information Systems, Ghent University - imec, Ghent, Belgium , Ghent (Belgium)
Published Article
EURASIP Journal on Audio, Speech, and Music Processing
Springer International Publishing
Publication Date
Mar 03, 2021
DOI: 10.1186/s13636-021-00201-y
Springer Nature


Localization of multiple speakers using microphone arrays remains a challenging problem, especially in the presence of noise and reverberation. State-of-the-art localization algorithms generally exploit the sparsity of speech in some representation for this purpose. Whereas the broadband approaches exploit time-domain sparsity for multi-speaker localization, narrowband approaches can additionally exploit sparsity and disjointness in the time-frequency representation. Broadband approaches are robust to spatial aliasing but do not optimally exploit the frequency domain sparsity, leading to poor localization performance for arrays with short inter-microphone distances. Narrowband approaches, on the other hand, are vulnerable to spatial aliasing, making them unsuitable for arrays with large inter-microphone spacing. Proposed here is an approach that decomposes a signal spectrum into a weighted sum of broadband spectral components (atoms) and then exploits signal sparsity in the time-atom representation for simultaneous multiple source localization. The decomposition into atoms is performed in situ using non-negative matrix factorization (NMF) of the short-term amplitude spectra and the localization estimate is obtained via a broadband steered-response power (SRP) approach for each active atom of a time frame. This SRP-NMF approach thereby combines the advantages of the narrowband and broadband approaches and performs well on the multi-speaker localization task for a broad range of inter-microphone spacings. On tests conducted on real-world data from public challenges such as SiSEC and LOCATA, and on data generated from recorded room impulse responses, the SRP-NMF approach outperforms the commonly used variants of narrowband and broadband localization approaches in terms of source detection capability and localization accuracy.

Report this publication


Seen <100 times