期刊文献+

改进CK-means+算法及并行实现 被引量:3

Improved CK-means+algorithm and parallel implementation
下载PDF
导出
摘要 为降低K值的不确定性和初始聚类中心的随机性对聚类结果的影响,提出一种基于优化Canopy算法和均值计算法的改进K-means算法——CK-means+。优化Canopy算法,降低距离阈值T不确定性对最终输出K值的影响,通过Canopy算法和均值计算法得到K值和初始中心点。在UCI数据集上,结合Spark框架并行化,实验结果表明,相较其它算法,CK-means+算法效率更高,可以更好适应大规模数据应用场景。 To reduce the influence of the uncertainty of K values and the randomness of initial clustering centers on the clustering results,a K-means algorithm was improved based on the optimized Canopy algorithm and the mean calculation method(CK-means+).The Canopy algorithm was optimized to reduce the influence of the distance threshold T uncertainty on the final output K-value,and the K-value and initial centroids were obtained through the Canopy algorithm and the mean calculation method.On the UCI dataset and combined with the parallelization of Spark framework,experimental results verify that compared with other algorithms,the CK-means+algorithm is more efficient and can be better adapted to large-scale data application scenarios.
作者 邵金鑫 行艳妮 南方哲 赵鑫 马廷淮 钱育蓉 SHAO Jin-xin;XING Yan-ni;NAN Fang-zhe;ZHAO Xin;MA Ting-huai;QIAN Yu-rong(Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region,Software College,Xinjiang University,Urumqi 830046,China;School of International Education,Nanjing University of Information Science and Technology,Nanjing 210044,China)
出处 《计算机工程与设计》 北大核心 2022年第5期1240-1248,共9页 Computer Engineering and Design
基金 国家自然科学基金项目(61966035) 新疆维吾尔自治区教育厅创新团队基金项目(XJEDU2017T002)。
关键词 Canopy算法 K-MEANS算法 初始值K 初始中心点 并行化 Canopy algorithm K-means algorithm value K initial center point parallelization
  • 相关文献

参考文献12

二级参考文献73

共引文献184

同被引文献25

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部