Affordable Access

Attributes scaling for K-means algorithm controlled by misclassification of all clusters

Authors
Publisher
Chulalongkorn University
Publication Date
Keywords
  • Algorithms Controlled
  • Data Mining
  • Machine Learning
  • Cluster Analysis
Disciplines
  • Computer Science

Abstract

K-means clustering, one of the well-known distance-based clustering methods, is a very popular unsupervised machine learning using in various applications. Researchers try to integrate the concept of supervised learning to K-means clustering via attribute-scaling vector. With addition of this vector, K-means clustering can be supervised by the information of target class provided in the training set. In this thesis, we explore and determine the optimal attribute-scaled vector that minimizes the misclassification rate of the target class. This thesis uses the non-linear unconstrained optimization techniques in attribute-scaled space, called cyclic coordinate method and Hooke and Jeeves method. Our experiments show that both methods can provide the optimal scaling vectors which effectively reduce the misclassification error of supervised K-means clustering and lead to the effective supervised clustering in some data sets. For other data sets, the improvement of misclassification error is still achievable, but the error is too high suggesting that those datasets are not suitable to apply supervised clustering.

There are no comments yet on this publication. Be the first to share your thoughts.