Affordable Access

deepdyve-link
Publisher Website

Estimating the number of clusters via a corrected clustering instability

Authors
  • Haslbeck, Jonas M. B.1
  • Wulff, Dirk U.2, 3
  • 1 University of Amsterdam,
  • 2 University of Basel,
  • 3 Max Planck Institute for Human Development,
Type
Published Article
Journal
Computational Statistics
Publisher
Springer Berlin Heidelberg
Publication Date
May 18, 2020
Volume
35
Issue
4
Pages
1879–1894
Identifiers
DOI: 10.1007/s00180-020-00981-5
PMID: 33088024
PMCID: PMC7550318
Source
PubMed Central
Keywords
License
Unknown

Abstract

We improve instability-based methods for the selection of the number of clusters k in cluster analysis by developing a corrected clustering distance that corrects for the unwanted influence of the distribution of cluster sizes on cluster instability. We show that our corrected instability measure outperforms current instability-based measures across the whole sequence of possible k , overcoming limitations of current insability-based methods for large k . We also compare, for the first time, model-based and model-free approaches to determining cluster-instability and find their performance to be comparable. We make our method available in the R-package cstab.

Report this publication

Statistics

Seen <100 times