Deep learning for churn prediction
- Authors
- Publication Date
- Dec 07, 2022
- Source
- HAL-Descartes
- Keywords
- Language
- English
- License
- Unknown
- External links
Abstract
The problem of churn prediction has been traditionally a field of study for marketing. However, in the wake of the technological advancements, more and more data can be collected to analyze the customers behaviors. This manuscript has been built in this frame, with a particular focus on machine learning. Thus, we first looked at the supervised learning problem. We have demonstrated that logistic regression, random forest and XGBoost taken as an ensemble offer the best results in terms of Area Under the Curve (AUC) among a wide range of traditional machine learning approaches. We also have showcased that the re-sampling approaches are solely efficient in a local setting and not a global one. Subsequently, we aimed at fine-tuning our prediction by relying on customer segmentation. Indeed,some customers can leave a service because of a cost that they deem to high, and other customers due to a problem with the customer’s service. Our approach was enriched with a novel deep neural network architecture, which operates with both the auto-encoders and the k-means approach. Going further, we focused on self-supervised learning in the tabular domain. More precisely, the proposed architecture was inspired by the work on the SimCLR approach, where we altered the architecture with the Mean-Teacher model from semi-supervised learning. We showcased through the win matrix the superiority of our approach with respect to the state of the art. Ultimately, we have proposed to apply what we have built in this manuscript in an industrial setting, the one of Brigad. We have alleviated the company churn problem with a random forest that we optimized through grid-search and threshold optimization. We also proposed to interpret the results with SHAP (SHapley Additive exPlanations).