Affordable Access

ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

Authors
  • Wright, Marvin N.
  • Ziegler, Andreas
Type
Preprint
Publication Date
Aug 18, 2015
Submission Date
Aug 18, 2015
Identifiers
arXiv ID: 1508.04409
Source
arXiv
License
Yellow
External links

Abstract

We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.

Report this publication

Statistics

Seen <100 times