Affordable Access

Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods

Authors
  • Alstermark, Olivia
  • Stolt, Evangelina
Publication Date
Jan 01, 2021
Source
DiVA - Academic Archive On-line
Keywords
Language
English
License
Green
External links

Abstract

When a company evaluates a customer for being a potential prospect, one of the key questions to answer is whether the customer will generate profit in the long run. A possible step to answer this question is to predict the likelihood of the customer returning to the company again after the initial purchase. The aim of this master thesis is to investigate the possibility of using machine learning techniques to predict the likelihood of a new customer returning for a second purchase within a certain time frame. To investigate to what degree machine learning techniques can be used to predict probability of return, a number of di↵erent model setups of Logistic Lasso, Support Vector Machine and Extreme Gradient Boosting are tested. Model development is performed to ensure well-calibrated probability predictions and to possibly overcome the diculty followed from an imbalanced ratio of returning and non-returning customers. Throughout the thesis work, a number of actions are taken in order to account for data protection. One such action is to add noise to the response feature, ensuring that the true fraction of returning and non-returning customers cannot be derived. To further guarantee data protection, axes values of evaluation plots are removed and evaluation metrics are scaled. Nevertheless, it is perfectly possible to select the superior model out of all investigated models. The results obtained show that the best performing model is a Platt calibrated Extreme Gradient Boosting model, which has much higher performance than the other models with regards to considered evaluation metrics, while also providing predicted probabilities of high quality. Further, the results indicate that the setups investigated to account for imbalanced data do not improve model performance. The main con- clusion is that it is possible to obtain probability predictions of high quality for new customers returning to a company for a second purchase within a certain time frame, using machine learning techniques. This provides a powerful tool for a company when evaluating potential prospects.

Report this publication

Statistics

Seen <100 times