摘要
含影响力因子的硬聚类算法(HCMef)在对多于两类规模不均的样本进行聚类时,影响力因子指数对聚类结果影响很大,取值不当会引起类消失。影响力因子指数在0附近穷举,并利用影响力因子指数较大时类规模会在较少训练次数内消失的现象,降低计算量。提出了含影响力因子的自适应C均值聚类策略(AHCMef),并介绍了两阶段聚类方法,进一步提高聚类成功率和执行效率。结果显示,最佳影响力因子指数出现在成功聚类时的较大值附近,聚类效果从该点起随影响力因子指数的减小而降低。对HCMef算法应用于多类规模不均样本情况时,影响力因子指数的选取具有指导意义。
The hard c-means clustering algorithm with effectiveness factors (HCMef) will result in disappearance of some cluster population, when the number of clusters is larger than two and the index of the effectiveness factors is in appropriately valued. The index of effectiveness factors is taken values near zero in exhaustive manner and advantages of the cluster disappearance in few epochs are taken to reduce the amount of computation. Proposed an adaptive hard c~ means algorithm with effectiveness factors,and introduced a two-phase clustering method to improve the efficiency and success rate of clustering. Results show that, the optimal index point of the effectiveness factors lies near the larger side during the successful clustering,and the clustering results worsen along with the decrease of effectiveness factor index from this point. It's of guiding significance for the selection of effectiveness factor indexes in HCMef algorithm when applied to data sets with multiple unequal sized clusters.
出处
《计算机科学》
CSCD
北大核心
2009年第1期222-226,共5页
Computer Science
基金
上海市科委项目(065115023)
国家重点基础研究发展规划(973)项目(2002CB312001)资助
关键词
聚类
类规模
C均值
影响力因子
Clustering,Cluster population,C-means, Effectiveness factor