Affordable Access

DENDIS: a new density-based sampling for clustering algorithm

Authors
  • Ros, F.
  • Guillaume, S.
Publication Date
Jan 01, 2016
Source
HAL-UPMC
Keywords
Language
English
License
Unknown
External links

Abstract

To deal with large datasets, sampling can be used as a preprocessing step for clustering. In this paper, an hybrid sampling algorithm is proposed. It is density-based while managing distance concepts to ensure space coverage and fit cluster shapes. At each step a new item is added to the sample: it is chosen as the furthest from the representative in the most important group. A constraint on the hyper volume induced by the samples avoids over sampling in high density areas. The inner structure allows for internal optimization: only a few distances have to be computed. The algorithm behavior is investigated using synthetic and real-world data sets and compared to alternative approaches, at conceptual and empirical levels. The numerical experiments proved it is more parsimonious, faster and more accurate, according to the Rand Index, with both k-means and hierarchical clustering algorithms.

Report this publication

Statistics

Seen <100 times