摘要
针对往复式压缩机故障数据空间分布复杂、常规算法不能有效聚类的问题,提出了一种改进的谱聚类算法.该算法使用新的相似度矩阵计算方式,根据故障数据流形分布的特点引入测地线距离取代欧氏距离作为数据间的关系度量;通过计算各数据点的邻域密度因子有效地识别和剔除了噪声点;利用基于密度的局部欧氏距离调整方法对流形间隙过小的区域进行了处理.在几个人工数据集和往复式压缩机故障数据集上的测试结果表明,改进谱聚类算法对于具有流形分布、多尺度、有噪声、流形间隙过小甚至交叉等特点的数据具有很好的聚类能力,聚类准确率比常规的k-均值和MSCA谱聚类算法分别提高了50.86%和8.6%.
An improved spectral clustering algorithm is proposed to focus on the problem that the general clustering algorithms are invalid for reciprocating compressor fault data lying on complex manifold. A new affinity matrix is obtained. The geodesic distance replaces the traditional Euclidian distance to measure the similarity of data, and neighborhood-based density factor is used to identify and to remove noise points. Moreover, density-based local Euclidian distance adjustment is introduced into areas with small gap between manifolds. The proposed method is implemented On several artificial datasets and a real reciprocating compressor fault dataset. Experimental results show that the new algorithm can accomplish the clustering for data with noise and multi-scale character, especially when the manifolds have small gaps or crossover between each other. Its accuracy is 50. 86% and 8. 6~ higher than those of k-means and MSCA respectively.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2012年第8期1-7,共7页
Journal of Xi'an Jiaotong University
基金
国家自然科学基金资助项目(61075001)
关键词
往复式压缩机
谱聚类
测地线距离
欧氏距离调整
reciprocating compressor
spectral clustering
geodesic distance
Euclidian distanceadjustment