摘要
为解决传统模糊C均值聚类(Fuzzy C-means,FCM)算法在处理大规模数据集时遇到的时间复杂和内存不足等瓶颈,提出基于大数据集抽样分块的多视角自适应模糊聚类算法,算法通过邻域正则约束提高传统FCM算法的抗噪性,通过低秩与熵加权约束提高多视角一致性,以提高算法对多样化数据聚类的适应性,最后通过Canopy算法初始聚类中心提取、数据抽样分块和自适应加权优化算法对大规模数据聚类的适应性。实验结果表明,算法在继承传统多视角FCM算法良好聚类性能基础上,减少了计算复杂度,提高了聚类准确率,适于大规模数据集聚类。
In order to solve the bottleneck of traditionalfuzzy C-means(FCM)algorithm in processing large-scale data sets,such as time complexity and insufficient memory,a multi-view adaptive fuzzy clustering algorithm based on large data set sampling and block is proposed.The neighborhood regular constraint is introduced to improves the noise resistance of the traditional FCM algorithm,and the low rank and entropy weighting constraints are used to improve the multi-view consistency which improves the adaptability of the algorithm to diversified data clustering.Finally,the Canopy algorithm which is used to extract the initial clustering center,the data sampling block and adaptive weighting optimization are introduced to improve the adaptability of the proposed algorithm to large-scale data clustering.Experimental results show that,the proposed algorithm inherits the good clustering performance of the traditional multi-view FCM algorithm,reduces computational complexity,improves clustering accuracy,and is suitable for large-scale data set clustering.
作者
田彦彦
孙静
TIAN Yan-yan;SUN Jing(Mechanical and Electrical Engineering,Zhengzhou Institute of Industrial Application Technology,Henan Zhengzhou 451100,China;Software College,Jilin University,Jilin Changchun 130000,China)
出处
《机械设计与制造》
北大核心
2021年第9期279-282,共4页
Machinery Design & Manufacture
基金
河南省科技攻关项目(152102210353)。
关键词
大规模数据聚类
邻域正则约束
多视角一致
数据抽样分块
自适应加权聚类
Large-Scale Data Sets Clustering
Neighborhood Regular Constraint
Multi-View Consistency
Data Sampling Block
Adaptive Weighted Clustering