摘要
针对经典K均值算法在聚类中心易受异常值影响,导致聚类结果不稳定的问题,提出基于样本分布密度的优化K-means算法,以提高聚类稳定性和准确性;聚类后通过CH指数和分类区间占比总体两种方法,客观评价3种离散化方法,结果表明,优化的K-means算法避免了区间分类不合理现象,更加准确地反映了成绩样本的分布特点。
In response to the problem of unstability in clustering results that is caused by susceptibility of the classical K-means algorithm in the clustering center to outliers,this paper proposes an optimized K-means algorithm based on sample distribution density to improve the stability and accuracy of clustering.After clustering,the methods of CH index and overall percentage of classification intervals are used to objectively evaluate the three discretization methods.The results show that the optimized K-means algorithm can avoid irrationality of interval classification and reflect distribution characteristics of grade samples more accurately.
作者
张梁
杨立波
张小勇
史俊冰
ZHANG Liang;YANG Libo;ZHANG Xiaoyong;SHI Junbing(Department of Intelligence and Automation,Taiyuan University,Taiyuan 030032,China)
出处
《太原学院学报(自然科学版)》
2024年第2期79-84,共6页
Journal of TaiYuan University:Natural Science Edition
基金
山西省教学改革创新项目(J20231427)
山西省大学生创新创业训练计划项目(20231442)
山西大学生创新创业训练计划项目(20231472)。