期刊文献+

局部迭代的快速K-means聚类算法 被引量:9

Partial Iterative Fast K-means Clustering Algorithm
下载PDF
导出
摘要 为了解决K-means算法在聚类数量增多的情况下,因选择了不合适的中心初值而影响到聚类效果这一问题,提出了一种局部迭代的快速K-means聚类算法(PIFKM+−)。该算法在K-means聚类的基础上,不断寻找能够被分割的聚类簇和能够被删除的聚类簇,并对受影响的局部数据进行重新聚类处理,降低了整个聚类更新的时间复杂度,提高了聚类的效果。PIFKM+−算法在面对聚类数量众多的情况下,具有能够快速更新聚类、对聚类中心初值不敏感、能够提高聚类精确度等优势。通过与K-means和K-means++两种算法的比较,在仿真数据集和真实数据集的综合实验下,验证了该算法的精确性、高效率性和可扩展性,同时实验结果的统计分析表明该算法在提高了聚类精确度的同时并没有损失太多的时间效率。 The K-means algorithm is one of the most popular and widely spread clustering methods.But it is not always possibly to find the appropriate initial value of the cluster centers,especially when the number of clusters is increased.If it can’t find suitable initial values,that will affect the clustering effect.This paper proposes an iterative approach to improve the quality of the clustering.This method called Partial Iterative Fast K-means plus-minus(PIFKM+−).Based on the K-means clustering,the algorithm divides a cluster and removes another one,then re-clusters the affected data,in each iteration.The algorithm reduces the time complexity and improves the effect of clustering.The proposed method has the advantages of being able to update clusters quickly,is insensitive to initial values of cluster centers,and can improve clustering accuracy in the face of a large number of clusters.By comparing with the K-means and K-means++,experimental results vividly demonstrate that the algorithm has better clustering effect,higher operating efficiency and scalability on the simulation data sets and the real data sets.Through the statistical analysis of the final experimental results,it is shown that the PIFKM+−algorithm does not lose too much time efficiency while improving clustering accuracy.
作者 李峰 李明祥 张宇敬 LI Feng;LI Mingxiang;ZHANG Yujing(Information Management and Engineering Department,Hebei Finance University,Baoding,Hebei 071051,China;Applied Technology Research and Development CenterWisdom Finance in Hebei University,Baoding,Hebei 071051,China)
出处 《计算机工程与应用》 CSCD 北大核心 2020年第13期63-71,共9页 Computer Engineering and Applications
基金 河北省教育厅青年基金(No.QN2019186) 河北省教育厅重点项目(No.ZD2019136) 河北省高校智慧金融研发中心项目(No.XGJ2018001)。
关键词 K-MEANS算法 聚类分割 聚类删除 局部迭代聚类 聚类邻居 K-means algorithm cluster segmentation cluster removing partial iterative clustering cluster neighbor
  • 相关文献

参考文献5

二级参考文献73

  • 1乔珠峰,田凤占,黄厚宽,陈景年.缺失数据处理方法的比较研究[J].计算机研究与发展,2006,43(z1):171-175. 被引量:13
  • 2王惠文.变量多重相关性对主成分分析的危害[J].北京航空航天大学学报,1996,22(1):65-70. 被引量:17
  • 3陈卓,孟庆春,魏振钢,任丽婕,窦金凤.一种基于网格和密度凝聚点的快速聚类算法[J].哈尔滨工业大学学报,2005,37(12):1654-1657. 被引量:14
  • 4朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量:50
  • 5陆锋 段滢滢 袁文.LBS的数据处理技术[J].中国计算机学会通讯,2010,.
  • 6Guha S, Meyerson A, Mishra N, Motwani R, O'Callaghan L. Clustering data streams: theory and practice. IEEE Trans-actions on Knowledge and Data Engineering, 2003, 15(3): 515-528.
  • 7Han J W, Kamber M. Data Mining Concepts and Tech- niques. Beijing: China Machine Press, 2006. 196-211.
  • 8Ester M, Kriegel H P, Sander J, Xu X W. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Confer- ence on Knowledge Discovery and Data Mining. Portland, USA: AAAI Press, 1996. 226-231.
  • 9Sander J, Ester M, Kriegel H P, Xu X W. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discov- ery, 1998, 2(2): 169-194.
  • 10Hinneburg A, Keim D A. An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th International Conference on Knowledge Discov- ery and Data Mining. New York, USA: AAAI Press, 1998. 58-65.

共引文献175

同被引文献121

引证文献9

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部