摘要
基于FCM的大规模数据聚类算法设计中,聚类中心点选择的迭代次数较多易于造成算法模型伸缩性不强、敏感性较弱和陷入局部最小值的难题.以大规模数据点精简算法设计为切入点,研究初始聚类中心点选择与FCM模型设计.首先,基于K近邻思想提出了数据点精简算法,获得精简之后的代表点集合.其次,兼顾原始数据点的稀疏程度和精简后代表点的分布特征,提出了基于密度的初始聚类中心点选取规则和具体步骤.再次,基于代表点集合和初始聚类中心点结果,给出了一种精简再融合的两阶段聚类算法.最后,运用仿真方法说明了本方法的有效性和优越性.
In the process of designing large-scale data clustering algorithms based on FCM,large number of iterations for selecting the cluster center point is likely to cause weak scalability,weak sensitivity,and falling into local minimums of the algorithm model.The paper uses large-scale data point reduction algorithm as the entry point,and studies the initial clustering center point selection and FCM model design.First,based on the K-nearest neighbor idea,a data point reduction algorithm is proposed to obtain a reduced set of representative points.Secondly,considering the sparseness of the original data points and the distribution characteristics of the reduced representative points,a density-based initial clustering center point selection rule and specific steps are proposed.Thirdly,based on the results of the representative point set and the initial clustering center point,a two-stage clustering algorithm with streamlined re-fusion is given.Finally,the effectiveness and superiority of this method are demonstrated using simulation methods.
作者
江文奇
黄容
牟华伟
袁亚纯
JIANG Wen-qi;HUANG Rong;MOU Hua-wei;YUAN Ya-chun(Department of Economics and Management,Nanjing University of Science and Technology,Nanjing 210094,China)
出处
《数学的实践与认识》
2021年第17期144-151,共8页
Mathematics in Practice and Theory
基金
国家自然科学基金项目(71971117)
教育部人文社科基金(17YJA630035)
南京理工大学自主科研培育项目(30916011331)
江苏省研究生科研与实践创新计划项目(KYCX18_0490 KYCX18_0489)的研究成果之一。
关键词
聚类算法
FCM
初始聚类中心点
K互近邻
数据精简
clustering algorithm
FCM
initial clustering center point
k-mutual neighbor
data reduction