Affordable Access

Functional anomaly detection and robust estimation

Authors
  • Staerman, Guillaume
Publication Date
Apr 12, 2022
Source
HAL
Keywords
Language
English
License
Unknown
External links

Abstract

Enthusiasm for Machine Learning is spreading to nearly all fields such as transportation, energy, medicine, banking or insurance as the ubiquity of sensors through IoT makes more and more data at disposal with an ever finer granularity. The abundance of new applications for monitoring of complex infrastructures (e.g. aircrafts, energy networks) together with the availability of massive data samples has put pressure on the scientific community to develop new reliable Machine-Learning methods and algorithms. The work presented in this thesis focuses around two axes: unsupervised functional anomaly detection and robust learning, both from practical and theoretical perspectives.The first part of this dissertation is dedicated to the development of efficient functional anomaly detection approaches. More precisely, we introduce Functional Isolation Forest (FIF), an algorithm based on randomly splitting the functional space in a flexible manner in order to progressively isolate specific function types. Also, we propose the novel notion of functional depth based on the area of the convex hull of sampled curves, capturing gradual departures from centrality, even beyond the envelope of the data, in a natural fashion. Estimation and computational issues are addressed and various numerical experiments provide empirical evidence of the relevance of the approaches proposed. In order to provide recommendation guidance for practitioners, the performance of recent functional anomaly detection techniques is evaluated using two real-world data sets related to the monitoring of helicopters in flight and to the spectrometry of construction materials.The second part describes the design and analysis of several robust statistical approaches relying on robust mean estimation and statistical data depth. The Wasserstein distance is a popular metric between probability distributions based on optimal transport. Although the latter has shown promising results in many Machine Learning applications, it suffers from a high sensitivity to outliers. To that end, we investigate how to leverage Medians-of-Means (MoM) estimators to robustify the estimation of Wasserstein distance with provable guarantees. Thereafter, a new statistical depth function, the Affine-Invariant Integrated Rank-Weighted (AI-IRW) depth is introduced. Beyond the theoretical analysis carried out, numerical results are presented, providing strong empirical confirmation of the relevance of the depth function proposed. The upper-level sets of statistical depths—the depth-trimmed regions—give rise to a definition of multivariate quantiles. We propose a new discrepancy measure between probability distributions that relies on the average of the Hausdorff distance between the depth-based quantile regions w.r.t. each distribution and demonstrate that it benefits from attractive properties of data depths such as robustness or interpretability. All algorithms developed in this thesis are open-sourced and available online.

Report this publication

Statistics

Seen <100 times