摘要
针对传统k-means算法不适用有不确定因素存在的环境和现有的三支k-means聚类分析中并未避免传统k-means算法随机选择初始簇中心而导致聚类结果不稳定的问题,论文提出一种改进的k-means算法,借助层次聚类算法和数学抽样方法,结合定义的聚类结果评估有效性指数,获得一组较优的初始中心,并将其作为k-means算法的初始簇中心,然后引进三支决策聚类理论方法进行聚类结果的优化,使其适应具有不确定因素的环境。实验表明,此方法在UCI数据集上的聚类效果、准确率和稳定性均有所提高。
The traditional k-means algorithm is not applicable to the environment with uncertain factors and the existing three k-means clustering analysis does not avoid the problem that the traditional k-means algorithm randomly selects the initial cluster center and leads to unstable clustering results.In this paper,an improved k-means algorithm is proposed.By using hierarchical clustering algorithm and mathematical sampling method,combined with the defined clustering results to evaluate the validity index,a set of better initial centers is obtained and used as k-means algorithm.The initial cluster center,then introduces three decision clustering theory methods to optimize the clustering results to adapt to the environment with uncertain factors.Experiments show that the clustering effect,accuracy and stability of this method on the UCI dataset are improved.
作者
蔺艳艳
陆介平
王郁鑫
傅廷妍
LIN Yanyan;LU Jieping;WANG Yuxin;FU Tingyan(School of Computer Science,Jiangsu University of Science and Technology,Zhenjiang 212001)
出处
《计算机与数字工程》
2020年第6期1294-1299,1353,共7页
Computer & Digital Engineering