基于约束信息的并行k-means算法被引量：8

Parallel k-means algorithm based on constrained information

下载PDF

导出

摘要为获得分布式数据集上用户所期望的聚类结果,提出了基于约束信息的并行k-means聚类算法.在分析并行k-means能够有效实现对水平分布式数据集进行聚类的基础上,修改并行k-means算法的目标函数,设计约束并行k-means算法,将站点用户的约束信息以chunklet的形式引入到分布式聚类过程,从而引导算法执行有偏搜索.约束并行k-means算法在理论上保证无约束样本簇内距离最小的同时能够确保chunklet约束中的样本与对应的簇中心之间的平均距离最小.实验结果表明,约束并行k-means算法能够有效改善并行k-means的聚类精度,同时在分布式环境下能够得到与已有约束聚类算法在集中式数据集上相等价的聚类结果. In order to obtain the desired clustering results on the distributed data set,a parallel k-means algorithm is presented based on constrained information.On the basis of the facts that the parallel k-means algorithm can be effectively used in clustering the horizontal distributed data set,the objective function of the parallel k-means algorithm is modified,and the constrained parallel k-means algorithm is designed,then the constrained information of site users is introduced into the distributed clustering process in the form of chunklets,which can guide the algorithm to a bias search.Theoretically the algorithm guarantees the inter-cluster distance among the unconstrained samples to be the closest,and guarantees the average distance between constrained samples in a chunklet and the corresponding cluster center to be the closest one.The results from the experiments show that the algorithm can effectively enhance the clustering precision of parallel k-means,meanwhile it can obtain the clustering results on the distributed data set,which are equivalent to the results of the constrained k-means algorithm running on a centralized data set.

作者於跃成王建东郑关胜陈斌

机构地区南京航空航天大学信息科学与技术学院江苏科技大学计算机科学与工程学院

出处《东南大学学报（自然科学版）》 EI CAS CSCD 北大核心 2011年第3期505-508,共4页 Journal of Southeast University：Natural Science Edition

基金国家高技术研究发展计划(863计划)资助项目(2006AA12A106) 国家自然科学基金资助项目(60903130)

关键词 K-MEANS 并行k-means 约束聚类约束并行k-means k-means parallel k-means constrained clustering constrained parallel k-means

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1Inan A, Kaya S V, Saygin Y, et al. Privacy preserving clustering on horizontally partitioned data [ J ]. Data and Knowledge Engineering, 2007, 63 ( 3 ) : 646 - 666.
2Dhillon I S, Modha D S. A data-clustering algorithm on distributed memory multiprocessors [ C ]//Proceedings of the KDD'99 Workshop on High Performance Knowl- edge Discovery. San Digeo, USA, 1999 : 245 - 260.
3MacQueen J. Some methods for classification and anal- ysis of multivariate observations [ C ]//The 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA, 1967 : 281 - 297.
4Bandyopadhyay S, Giannella C, Maulik U, et al. Clus- tering distributed data streams in peer to peer environ- ments I J ]. Information Sciences, 2006, 176 ( 14 ) : 1952 - 1985.
5Datta S, Bhaduri K, Giannella C, et al. Distributed da- ta mining in peer-to-peer networks [ J ]. lnternet Com- puting, 2006, 10(4) : 18-26.
6Datta S, Giannella C, Kargupta H. Approximate dis- tributed k-means clustering over a peer-to-peer network [ J]. IEEE Transactions on Knowledge and Data Engi- neering, 2009, 21 (10) : 1372 - 1388.
7Jin R M, Goswami A, Agrawal G. Fast and exact out- of-core and distributed k-means clustering [ J ]. Knowl- edge and Information Systems, 2006, 10( 1 ) : 17 -40.
8Wagstaff K, Cardie C, Rogers S, et al. Constrained k- means clustering with background knowledge [ C ]// Proceeding of 18th International Conference on Machine Learning. Williamstown, USA, 2001 : 577 - 584.
9Basu S, Banerjee A, Mooney R J. Active semi-supervi- sion for pairwise constrained clustering [ C ]//Proceed- ings of the 2004 SIAM International Conference on Data Mining. Lake Buena Vista, FL, USA, 2004:333 - 344.
10Bar-Hillel A, Hertz T, Shental N, et al. Learning a Mahalanobis metric from equivalence constraints [ J ]. Journal of Machine Learning Research, 2005, 6:937 - 965.

同被引文献89

1张石磊,武装.一种基于Hadoop云计算平台的聚类算法优化的研究[J].计算机科学,2012,39(S2):115-118. 被引量：29
2江小平,李成华,向文,张新访,颜海涛.k-means聚类算法的MapReduce并行化实现[J].华中科技大学学报（自然科学版）,2011,39(S1):120-124. 被引量：79
3段海滨,王道波,于秀芬.蚁群算法的研究进展评述[J].自然杂志,2006,28(2):102-105. 被引量：31
4朱庆保.蚁群优化算法的收敛性分析[J].控制与决策,2006,21(7):763-766. 被引量：24
5朱雪芳.改进支持向量聚类算法的研究[J].计算机测量与控制,2006,14(12):1732-1735. 被引量：6
6金阳,左万利.一种基于动态近邻选择模型的聚类算法[J].计算机学报,2007,30(5):756-762. 被引量：18
7谷保平,许孝元,郭红艳.基于粒子群优化的k均值算法在网络入侵检测中的应用[J].计算机应用,2007,27(6):1368-1370. 被引量：24
8陆林花,王波.一种改进的遗传聚类算法[J].计算机工程与应用,2007,43(21):170-172. 被引量：26
9吉根林,凌霄汉,杨明.一种基于集成学习的分布式聚类算法[J].东南大学学报（自然科学版）,2007,37(4):585-588. 被引量：1
10Alsabti K,Ranka S,Singh V. An efficient k-means clusteringalgorithm [C]. IPPS/SPDP Workshop on High Performance Dat-aMining. Orlando,Florida,1998.

引证文献8

1於跃成,刘彩生,生佳根.分布式约束一致高斯混合模型[J].南京理工大学学报,2013,37(6):799-806. 被引量：3
2郝春梅,吴波.基于量子蚁群改进的K-means算法[J].计算机测量与控制,2013,21(4):1011-1013. 被引量：3
3李蓉,周维柏.基于改进的K-Means算法入侵检测框架[J].实验室研究与探索,2014,33(3):110-114. 被引量：3
4谢雪莲,李兰友.基于云计算的并行K-means聚类算法研究[J].计算机测量与控制,2014,22(5):1510-1512. 被引量：21
5张睿哲,杨照峰,赵伟艇.一种基于量子进化算法改进的k-mean聚类算法[J].微处理机,2014,35(4):71-73.
6周润物,李智勇,陈少淼,陈京,李仁发.面向大数据处理的并行优化抽样聚类K-means算法[J].计算机应用,2016,36(2):311-315. 被引量：45
7杨玉琴,陈桂芬,郭宏亮,孙雄辉,蔡丽霞.大数据处理技术在土壤肥力评价中的研究[J].中国农机化学报,2016,37(4):233-236. 被引量：2
8张占峰,耿珊珊.MapReduce框架下常用聚类算法比较研究[J].河北省科学院学报,2019,36(2):1-6. 被引量：3

二级引证文献80

1禤世丽,刘建明.基于Hadoop平台的K-means聚类算法并行化改进研究[J].玉林师范学院学报,2020(3):90-96.
2谢雪莲,李兰友.基于云计算的并行K-means聚类算法研究[J].计算机测量与控制,2014,22(5):1510-1512. 被引量：21
3张晓婷,李茵,唐晶磊.基于优化聚类算法的大数据分流系统设计仿真[J].计算机仿真,2018,35(12):204-207. 被引量：6
4赵伟,李俊锋,韩英,张红涛.Hadoop云平台下的基于用户协同过滤算法研究[J].计算机测量与控制,2015,23(6):2082-2085. 被引量：5
5何佩佩,谢颖华.云环境下K-means算法的并行化[J].微型机与应用,2015,34(24):25-27. 被引量：1
6张玉峰,曾奕棠.基于云聚类挖掘的物流信息智能分析方法研究[J].情报资料工作,2016,37(1):42-47. 被引量：3
7杨玉琴,陈桂芬,郭宏亮,孙雄辉,蔡丽霞.大数据处理技术在土壤肥力评价中的研究[J].中国农机化学报,2016,37(4):233-236. 被引量：2
8张少辉,王迤冉.用于图像识别的稀疏高斯编码[J].南京理工大学学报,2016,40(1):61-66. 被引量：3
9刘晓悦,郭强.海量用电数据并行聚类分析[J].辽宁工程技术大学学报（自然科学版）,2016,35(1):76-80. 被引量：5
10李海波,梁梦夏.面向协同任务系统的资源服务序列挖掘方法[J].计算机集成制造系统,2016,22(7):1789-1797. 被引量：1

1谢雪莲,李兰友.基于云计算的并行K-means聚类算法研究[J].计算机测量与控制,2014,22(5):1510-1512. 被引量：21
2张望,王辉.个性化服务中的并行K-Means聚类算法[J].微电子学与计算机,2007,24(10):65-67. 被引量：4
3王辉,张望,范明.基于集群环境的K-Means聚类算法的并行化[J].河南科技大学学报（自然科学版）,2008,29(4):42-45. 被引量：10
4张石磊,武装.一种基于Hadoop云计算平台的聚类算法优化的研究[J].计算机科学,2012,39(S2):115-118. 被引量：29
5寸江涛,高提雷.基于MPI的并行K-Means算法研究[J].保山学院学报,2016,35(5):77-80.
6杨静,刘宁,张键沛.一种基于约束的半监督聚类查询扩展方法[J].中国科技论文,2013,8(10):994-997.
7李晓瑜,俞丽颖,雷航,唐雪飞.一种K-means改进算法的并行化实现与应用[J].电子科技大学学报,2017,46(1):61-68. 被引量：49
8赵卫中,马慧芳,傅燕翔,史忠植.基于云计算平台Hadoop的并行k-means聚类算法设计研究[J].计算机科学,2011,38(10):166-168. 被引量：83
9霍迎秋,秦仁波,邢彩燕,陈曦,方勇.基于CUDA的并行K-means聚类图像分割算法优化[J].农业机械学报,2014,45(11):47-53. 被引量：28
10牛新征,佘堃.面向大规模数据的快速并行聚类划分算法研究[J].计算机科学,2012,39(1):134-137. 被引量：22

东南大学学报（自然科学版）

2011年第3期

浏览历史

内容加载中请稍等...

基于约束信息的并行k-means算法被引量：8

参考文献11

同被引文献89

引证文献8

二级引证文献80

相关作者

相关机构

相关主题

浏览历史

基于约束信息的并行k-means算法 被引量：8

参考文献11

同被引文献89

引证文献8

二级引证文献80

相关作者

相关机构

相关主题

浏览历史

基于约束信息的并行k-means算法被引量：8