Affordable Access

Access to the full text

Classification trees with soft splits optimized for ranking

Authors
  • Dvořák, Jakub1
  • 1 Academy of Sciences of the Czech Republic, Institute of Computer Science, Pod Vodárenskou věží 271/2, Prague 8, 182 07, Czech Republic , Prague 8 (Czechia)
Type
Published Article
Journal
Computational Statistics
Publisher
Springer Berlin Heidelberg
Publication Date
Feb 04, 2019
Volume
34
Issue
2
Pages
763–786
Identifiers
DOI: 10.1007/s00180-019-00867-1
Source
Springer Nature
Keywords
License
Yellow

Abstract

We consider softening of splits in classification trees generated from multivariate numerical data. This methodology improves the quality of the ranking of the test cases measured by the AUC. Several ways to determine softening parameters are introduced and compared including softening algorithm present in the standard methods C4.5 and C5.0. In the first part of the paper, a few settings of softening determined only from ranges of training data in the tree branches are explored. The trees softened with these settings are used to study the effect of using the Laplace correction together with soft splits. In a later part we introduce methods which employ maximization of the classifier’s performance on the training set over the domain of the softening parameters. The non-linear optimization algorithm Nelder–Mead is used and various target functions are considered. The target function evaluating the AUC on the training set is compared with functions summing over training cases some transformation of the error of score. Several data sets from the UCI repository are used in experiments.

Report this publication

Statistics

Seen <100 times