Abstract
I. Introduction
II. Related Works
III. Preliminaries
IV. The Proposed Algorithm
V. Experimental Analysis
Authors
Figures
References
Abstract
A novel clustering algorithm by fast search and find of density peaks (DP) was proposed in Science, 2014. It has attracted much attention from researchers. It can easily select clusters centers with decision graph. However, it cannot be used to cluster manifold data sets as the existing distance measurement is not suitable to evaluate the dissimilarity between objects on manifold structure. Some researchers use graph-based distance to measure the dissimilarity between objects on manifold clusters, but computing the graph-based distance on the original data set is time consuming. An improved density peaks clustering algorithm based on shared-neighbors between local cores, SLORE-DP, is proposed in this paper. First, it finds local cores to represent the data set and redefines the graph-based distance between local cores with sharedneighbors-based distance. Then natural neighbor-based density and the new defined graph-based distance are used to construct decision graph on local cores and DP algorithm is employed to cluster local cores. Finally, the remaining points are assigned to the same cluster as their local cores belong to. Since we use the new defined graph-based distance to estimate the dissimilarity between local cores, SLORE-DP can be used to cluster manifold data sets and at the same time it only calculates the shortest path between local cores, which greatly reduces the running time of the algorithm. We do experiments on several synthetic data sets containing manifold clusters and several real data sets from UCI. The results show that SLORE-DP is more effective and efficient than other algorithms when clustering manifold data sets.
Introduction
As an unsupervised learning, clustering is an important method for data analysis. It has been widely used in the field of pattern recognition, image processing, and information retrieval. It is designed to divide objects into multiple clusters, so that similar objects are in the same cluster while different objects are in different clusters. Many clustering algorithms have been proposed over the past few decades. According to different strategies, these algorithms can be roughly grouped into partitioning methods, density-based methods, hierarchical methods, model-based methods and grid-based methods. Among them, partitioning, density-based and hierarchical algorithms, due to their simple principle, are the most popular. K-means [1] and K-medoids [2] are typical partitioning algorithms. However, their performance depends on the selection of initial cluster centers. To avoid selecting cluster centers, AP algorithm [3] treats all objects as potential centers. K-AP [4] is an improved AP algorithm. It uses the immediate result of K clusters by introducing a constraint in the process of message passing. However, since each point is always allocated to the nearest center, these algorithms cannot discover arbitrary-shaped clusters. DBSCAN [5] is a typical density-based clustering algorithm. It defines clusters as dense regions separated by sparse regions. Dcore [6] is a hybrid decentralized approach which is based on finding density cores instead of centroids.