The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the ...The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the fluctuations and instability of the clustering results are strongly affected by the initial clustering center.This paper proposed an algorithm to select the initial clustering center to eliminate the uncertainty of central point selection.The experiment results show that the improved K-means clustering algorithm is superior to the traditional algorithm.展开更多
Though K-means is very popular for general clustering, its performance, which generally converges to numerous local minima, depends highly on initial cluster centers. In this paper a novel initialization scheme to sel...Though K-means is very popular for general clustering, its performance, which generally converges to numerous local minima, depends highly on initial cluster centers. In this paper a novel initialization scheme to select initial cluster centers for K-means clustering is proposed. This algorithm is based on reverse nearest neighbor (RNN) search which retrieves all points in a given data set whose nearest neighbor is a given query point. The initial cluster centers computed using this methodology are found to be very close to the desired cluster centers for iterative clustering algorithms. This procedure is applicable to clustering algorithms for continuous data. The application of the proposed algorithm to K-means clustering algorithm is demonstrated. An experiment is carried out on several popular datasets and the results show the advantages of the proposed method.展开更多
Consensus clustering aims to fuse several existing basic partitions into an integrated one; this has been widely recognized as a promising tool for multi-source and heterogeneous data clustering. Owing to robust and h...Consensus clustering aims to fuse several existing basic partitions into an integrated one; this has been widely recognized as a promising tool for multi-source and heterogeneous data clustering. Owing to robust and high-quality performance over traditional clustering methods, consensus clustering attracts much attention, and much efforts have been devoted to develop this field. In the literature, the K-means-based Consensus Clustering(KCC) transforms the consensus clustering problem into a classical K-means clustering with theoretical supports and shows the advantages over the state-of-the-art methods. Although KCC inherits the merits from K-means,it suffers from the initialization sensitivity. Moreover, the current consensus clustering framework separates the basic partition generation and fusion into two disconnected parts. To solve the above two challenges, a novel clustering algorithm, named Greedy optimization of K-means-based Consensus Clustering(GKCC) is proposed.Inspired by the well-known greedy K-means that aims to solve the sensitivity of K-means initialization, GKCC seamlessly combines greedy K-means and KCC together, achieves the merits inherited by GKCC and overcomes the drawbacks of the precursors. Moreover, a 59-sampling strategy is conducted to provide high-quality basic partitions and accelerate the algorithmic speed. Extensive experiments on 36 benchmark datasets demonstrate the significant advantages of GKCC over KCC and KCC++ in terms of the objective function values and standard deviations and external cluster validity.展开更多
文摘The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the fluctuations and instability of the clustering results are strongly affected by the initial clustering center.This paper proposed an algorithm to select the initial clustering center to eliminate the uncertainty of central point selection.The experiment results show that the improved K-means clustering algorithm is superior to the traditional algorithm.
基金Supported by the National Natural Science Foundation of China (60503020, 60503033, 60703086)the Natural Science Foundation of Jiangsu Province (BK2006094)+1 种基金the Opening Foundation of Jiangsu Key Labo-ratory of Computer Information Processing Technology in Soochow University ( KJS0714)the Research Foundation of Nanjing University of Posts and Telecommunications (NY207052, NY207082)
文摘Though K-means is very popular for general clustering, its performance, which generally converges to numerous local minima, depends highly on initial cluster centers. In this paper a novel initialization scheme to select initial cluster centers for K-means clustering is proposed. This algorithm is based on reverse nearest neighbor (RNN) search which retrieves all points in a given data set whose nearest neighbor is a given query point. The initial cluster centers computed using this methodology are found to be very close to the desired cluster centers for iterative clustering algorithms. This procedure is applicable to clustering algorithms for continuous data. The application of the proposed algorithm to K-means clustering algorithm is demonstrated. An experiment is carried out on several popular datasets and the results show the advantages of the proposed method.
基金supported in part by the National Natural Science Foundation of China (No. 71471009)
文摘Consensus clustering aims to fuse several existing basic partitions into an integrated one; this has been widely recognized as a promising tool for multi-source and heterogeneous data clustering. Owing to robust and high-quality performance over traditional clustering methods, consensus clustering attracts much attention, and much efforts have been devoted to develop this field. In the literature, the K-means-based Consensus Clustering(KCC) transforms the consensus clustering problem into a classical K-means clustering with theoretical supports and shows the advantages over the state-of-the-art methods. Although KCC inherits the merits from K-means,it suffers from the initialization sensitivity. Moreover, the current consensus clustering framework separates the basic partition generation and fusion into two disconnected parts. To solve the above two challenges, a novel clustering algorithm, named Greedy optimization of K-means-based Consensus Clustering(GKCC) is proposed.Inspired by the well-known greedy K-means that aims to solve the sensitivity of K-means initialization, GKCC seamlessly combines greedy K-means and KCC together, achieves the merits inherited by GKCC and overcomes the drawbacks of the precursors. Moreover, a 59-sampling strategy is conducted to provide high-quality basic partitions and accelerate the algorithmic speed. Extensive experiments on 36 benchmark datasets demonstrate the significant advantages of GKCC over KCC and KCC++ in terms of the objective function values and standard deviations and external cluster validity.