摘要
针对传统大数据聚类算法中存在效果差和时间复杂度高的问题,本文提出一种基于增量随机抽样的大数据迭代优化模糊C-均值算法(Fuzzy C-Means Algorithm,FCM)算法。该算法在大数据聚类期间引入增量式技术,对块内数据点执行并行计算处理,而且在迭代过程中无须存储庞大的隶属度矩阵,从而使得在不影响聚类结果质量的情况下大大减少运行时间和存储空间,能够有效提升聚类执行速度.实验结果表明,相对于其他聚类算法,本文提出的算法在几种聚类指标中的性能突出,而且在计算效率和放大性能方面对大数据聚类十分有效.
Aiming at the problems of poor effect and high time complexity in traditional big data clustering algorithms,this paper proposes a fuzzy C-Means Algorithm(FCM)algorithm based on incremental random sampling for big data iterative optimization.The algorithm introduces incremental technology during big data clustering,performs parallel computing processing on data points in the block,and does not need to store a huge membership matrix during the iterative process,so that the quality of the clustering results is greatly improved without affecting the quality of the clustering results.Reduce running time and storage space,which can effectively improve the execution speed of clustering.Experimental results show that compared with other clustering algorithms,the algorithm proposed in this paper has outstanding performance in several clustering indicators,and it is very effective for big data clustering in terms of computational efficiency and amplification performance.
作者
施媛波
SHI Yuanbo(Business School,Yunnan Normal University,Kunming Yunnan 650106,China)
出处
《信息与电脑》
2021年第3期73-76,共4页
Information & Computer
基金
云南省教育厅科学研究基金项目(项目编号:2019J1048,2019J1042)。
关键词
大数据
增量聚类算法
并行计算
模糊C-均值
large data
incremental clustering algorithm
parallel computing
fuzzy C-Means