Affordable Access

Regression versus classification for neural network based audio source localization

  • Perotin, Lauréline
  • Défossez, Alexandre
  • Vincent, Emmanuel
  • Serizel, Romain
  • Guérin, Alexandre
Publication Date
Oct 20, 2019
Kaleidoscope Open Archive
External links


We compare the performance of regression and classification neural networks for single-source direction-of-arrival estimation. Since the output space is continuous and structured, regression seems more appropriate. However, classification on a discrete spherical grid is widely believed to perform better and is predominantly used in the literature. For regression, we propose two ways to account for the spherical geometry of the output space based either on the angular distance between spherical coordinates or on the mean squared error between Cartesian coordinates. For classification, we propose two alternatives to the classical one-hot encoding framework: we derive a Gibbs distribution from the squared angular distance between grid points and use the corresponding probabilities either as soft targets or as cross-entropy weights that retain a clear probabilis-tic interpretation. We show that regression on Cartesian coordinates is generally more accurate, except when localized interference is present, in which case classification appears to be more robust.

Report this publication


Seen <100 times