期刊文献+

基于约束信息的并行k-means算法 被引量:8

Parallel k-means algorithm based on constrained information
下载PDF
导出
摘要 为获得分布式数据集上用户所期望的聚类结果,提出了基于约束信息的并行k-means聚类算法.在分析并行k-means能够有效实现对水平分布式数据集进行聚类的基础上,修改并行k-means算法的目标函数,设计约束并行k-means算法,将站点用户的约束信息以chunklet的形式引入到分布式聚类过程,从而引导算法执行有偏搜索.约束并行k-means算法在理论上保证无约束样本簇内距离最小的同时能够确保chunklet约束中的样本与对应的簇中心之间的平均距离最小.实验结果表明,约束并行k-means算法能够有效改善并行k-means的聚类精度,同时在分布式环境下能够得到与已有约束聚类算法在集中式数据集上相等价的聚类结果. In order to obtain the desired clustering results on the distributed data set,a parallel k-means algorithm is presented based on constrained information.On the basis of the facts that the parallel k-means algorithm can be effectively used in clustering the horizontal distributed data set,the objective function of the parallel k-means algorithm is modified,and the constrained parallel k-means algorithm is designed,then the constrained information of site users is introduced into the distributed clustering process in the form of chunklets,which can guide the algorithm to a bias search.Theoretically the algorithm guarantees the inter-cluster distance among the unconstrained samples to be the closest,and guarantees the average distance between constrained samples in a chunklet and the corresponding cluster center to be the closest one.The results from the experiments show that the algorithm can effectively enhance the clustering precision of parallel k-means,meanwhile it can obtain the clustering results on the distributed data set,which are equivalent to the results of the constrained k-means algorithm running on a centralized data set.
出处 《东南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2011年第3期505-508,共4页 Journal of Southeast University:Natural Science Edition
基金 国家高技术研究发展计划(863计划)资助项目(2006AA12A106) 国家自然科学基金资助项目(60903130)
关键词 K-MEANS 并行k-means 约束聚类 约束并行k-means k-means parallel k-means constrained clustering constrained parallel k-means
  • 相关文献

参考文献11

  • 1Inan A, Kaya S V, Saygin Y, et al. Privacy preserving clustering on horizontally partitioned data [ J ]. Data and Knowledge Engineering, 2007, 63 ( 3 ) : 646 - 666.
  • 2Dhillon I S, Modha D S. A data-clustering algorithm on distributed memory multiprocessors [ C ]//Proceedings of the KDD'99 Workshop on High Performance Knowl- edge Discovery. San Digeo, USA, 1999 : 245 - 260.
  • 3MacQueen J. Some methods for classification and anal- ysis of multivariate observations [ C ]//The 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA, 1967 : 281 - 297.
  • 4Bandyopadhyay S, Giannella C, Maulik U, et al. Clus- tering distributed data streams in peer to peer environ- ments I J ]. Information Sciences, 2006, 176 ( 14 ) : 1952 - 1985.
  • 5Datta S, Bhaduri K, Giannella C, et al. Distributed da- ta mining in peer-to-peer networks [ J ]. lnternet Com- puting, 2006, 10(4) : 18-26.
  • 6Datta S, Giannella C, Kargupta H. Approximate dis- tributed k-means clustering over a peer-to-peer network [ J]. IEEE Transactions on Knowledge and Data Engi- neering, 2009, 21 (10) : 1372 - 1388.
  • 7Jin R M, Goswami A, Agrawal G. Fast and exact out- of-core and distributed k-means clustering [ J ]. Knowl- edge and Information Systems, 2006, 10( 1 ) : 17 -40.
  • 8Wagstaff K, Cardie C, Rogers S, et al. Constrained k- means clustering with background knowledge [ C ]// Proceeding of 18th International Conference on Machine Learning. Williamstown, USA, 2001 : 577 - 584.
  • 9Basu S, Banerjee A, Mooney R J. Active semi-supervi- sion for pairwise constrained clustering [ C ]//Proceed- ings of the 2004 SIAM International Conference on Data Mining. Lake Buena Vista, FL, USA, 2004:333 - 344.
  • 10Bar-Hillel A, Hertz T, Shental N, et al. Learning a Mahalanobis metric from equivalence constraints [ J ]. Journal of Machine Learning Research, 2005, 6:937 - 965.

同被引文献89

引证文献8

二级引证文献80

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部