Random forests and interpretability of learning algorithms
- Authors
- Publication Date
- Dec 06, 2021
- Source
- HAL-SHS
- Keywords
- Language
- English
- License
- Unknown
- External links
Abstract
This thesis deals with the interpretability of learning algorithms in an industrial context.Manufacturing production and the design of industrial systems are two examples whereinterpretability of learning methods enables to grasp how the inputs and outputs of a system are connected, and therefore to improve the system efficiency. Although there is no consensus on a precise definition of interpretability, it is possible to identify several requirements: “simplicity, stability, and accuracy”, rarely all satisfied by existing interpretable methods. The structure and stability of random forests make them good candidates to improve the performance of interpretable algorithms. The first part of this thesis is dedicated to post-hoc methods, in particular variable importance measures for random forests. The first convergence result of Breiman’s MDA is established, and shows that this measure is strongly biased using a sensitivity analysis perspective. The Sobol-MDA algorithm is introduced to fix the MDA flaws, replacing permutations by projections. An extension to Shapley effects, an efficient importance measure when input variables are dependent, is then proposed with the SHAFF algorithm. The second part of this thesis focuses on rule learning models, which are simple and highly predictive algorithms, but are also very often unstable with respect to small data perturbations. SIRUS algorithm is designed as the extraction of a compact rule ensemble from a random forest, and considerably improves stability over state-of-the-art competitors, while preserving simplicity and accuracy.