摘要
传统聚类方法采用欧氏距离作为测量尺度,变量间的相关性会造成聚类结果失真。文章提出了K均值和马田系统相结合的聚类方法。首先对所有数据进行K均值聚类,生成K个初始类;其次对每个初始类采用基于稳健马氏距离的异常值检测方法剔除异常值,构建K个稳健马氏空间;最后计算每个数据在K个稳健马氏空间上的马氏距离,将其划入具有最小马氏距离值的类中。数值实验结果表明该方法能显著提高K均值聚类结果,与已有方法比较也表现出良好的聚类效果。
Traditional clustering methods take Euclidean distance as measurement scale,and the correlation between variables will cause the distortion of clustering results.This paper proposes a clustering method of combining K-means with Mahalanobis-Taguchi system.Firstly,K-means clustering is performed on all data to generate K initial classes.Then,the outlier detection method based on robust Mahalanobis distance is used for each initial class to eliminate the outliers,and K robust Mahalanobis spaces are constructed.Finally,the Mahalanobis distance of each data in K robust Mahalanobis spaces is calculated and classified into the class with the minimum Mahalanobis distance value.Numerical experiment results show that the proposed method can significantly improve the K-means clustering results,and also shows better clustering effect compared with the existing methods.
作者
生志荣
程龙生
Sheng Zhirong;Cheng Longsheng(School of Economics and Management,Nanjing University of Science and Technology,Nanjing 210094,China;Nanjing Normal University Taizhou College,Taizhou Jiangsu 225300,China)
出处
《统计与决策》
CSSCI
北大核心
2021年第14期45-48,共4页
Statistics & Decision
基金
国家自然科学基金资助项目(7127114)。
关键词
马田系统
K均值聚类
马氏空间
稳健马氏距离
Mahalanobis-Taguchi system
K-means clustering
Mahalanobis space
robust Mahalanobis distance