摘要
K均值聚类算法是聚类领域最知名的方法之一,然而K均值聚类完全依赖欧式距离进行聚类,忽略了样本特征离散程度对聚类结果的影响,导致聚类边缘样本容易被误聚类,且算法易局部收敛,聚类准确率较低。针对传统K均值聚类算法的不足,提出了似然K均值聚类算法,对于每个聚类的所有样本考虑每个维度样本特征的离散程度信息,分别计算样本属于某一聚类的似然概率,能够有效提高聚类准确率。在人造数据集和基准数据集验证了似然K均值聚类算法的优越性,将其应用于涡扇发动机气路部件故障以及传感器故障的模式识别,验证了该算法在涡扇发动机故障诊断中的实用性和有效性。
K-means clustering algorithm is one of the most well-known methods in the field of clustering. However,k-means clustering completely relies on Euclidean distance for clustering and ignores the influence of sample feature dispersion on clustering results. As a result, samples at the clustering edge are easy to be misclustered, the algorithm is easy to fall into local convergence, and the clustering accuracy is low. Aiming at the shortcomings of the traditional k-means clustering algorithm, this paper proposes the likelihood k-means clustering algorithm. Considering the discrete degree information of sample characteristics in each dimension for all samples of each cluster, the likelihood probability of samples belonging to a certain cluster is calculated separately, which can effectively improve the clustering accuracy. In this paper, the superiority of the likelihood k-means clustering algorithm is verified in the artificial data set and benchmark data set, and then it is applied to the pattern recognition for gas-path component failure and sensor failure of a turbofan engine, where the practicability and effectiveness of the proposed algorithm for the failure diagnostics of turbofan engine are verified.
作者
卢俊杰
黄金泉
鲁峰
LU Junjie;HUANG Jinquan;LU Feng(Jiangsu Province Key Laboratory Power Systems,College of Energy and Power Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)
出处
《计算机工程与应用》
CSCD
北大核心
2020年第9期136-141,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.51276087)
江苏省研究生科研创新计划(No.KYCX170281)
南京航空航天大学博士学位论文创新与创优基金(No.BCXJ17-02)。
关键词
K均值聚类
似然概率
涡扇发动机
气路故障
模式识别
K-means clustering
likelihood probability
turbofan engine
gas path failure
pattern recognition