基于目标特征选择和去除的改进K-means聚类算法被引量：17

Improved K-means clustering algorithm based on feature selection and removal on target point

导出

摘要针对高维数据聚类中K-means算法无法有效抑制噪声特征、实现不规则形状聚类的缺点,提出一种基于目标点特征选择和去除的改进K-均值聚类算法.该算法使用闵可夫斯基规度作为评价距离进行目标点的分类,增设权重调节参数a、重置权重系数α进行特征选择和去除,可有效减小非聚类指标特征带来的噪声影响.算法验证实验选取UCI真实数据集和人工数据集进行聚类分析,验证改进算法对抑制噪声特征的有效性,与WK-means、iMWK-means算法进行实验对比,分析聚类学习时特征选择的适用性,同时寻找最优的距离系数β和权重系数α. Aiming at the weakness that the K-means algorithm cannot effectively suppress the noise attributes and realize irregular shape clustering on high-dimensional data,an improved K-means clustering algorithm based on feature selection and removal on target point is proposed.In the improved K-means algorithm,the Minkowski metric is adopted as the evaluation of distance for the classification of the target point.The weighting adjustment parameter a is added and the weighting coefficientαis reset for feature selection and removal,which can reduce the effect of non-clustering index noise features.The UCI real datasets and artificial datasets are used for clustering analysis in the algorithm validation experiment.And the effectiveness of suppressing the noise features is validated.Compared with the WK-means and iMWK-means algorithms in the validation experiment,the applicability of feature selection in clustering learning process is analyzed.At the same time,the optimal distance coefficientβand the weighting coefficientαare found.

作者杨华晖孟晨王成姚运志 YANG Hua-hui;MENG Chen;WANG Cheng;YAO Yun-zhi(Department of Missile Engineering,Army Engineering University,Shijiazhuang 050003,China)

机构地区陆军工程大学导弹工程系

出处《控制与决策》 EI CSCD 北大核心 2019年第6期1219-1226,共8页 Control and Decision

基金国家自然科学基金项目(61501493)

关键词 K-均值算法特征选择高维数据聚类特征赋权数据去噪 K-means algorithm feature selection high-dimensional data clustering feature weighting data denoising

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献5

1黄月,吴成东,张云洲,程龙,孙尧.基于K均值聚类的二进制传感器网络多目标定位方法[J].控制与决策,2013,28(10):1497-1501. 被引量：4
2陈爱国,王士同.基于多代表点的大规模数据模糊聚类算法[J].控制与决策,2016,31(12):2122-2130. 被引量：9
3李向丽,耿鹏,邱保志.混合属性数据集的聚类边界检测技术[J].控制与决策,2015,30(1):171-175. 被引量：5
4李武,赵娇燕,严太山.基于平均差异度优选初始聚类中心的改进K-均值聚类算法[J].控制与决策,2017,32(4):759-762. 被引量：30
5王莉,周献中,沈捷.一种改进的粗K均值聚类算法[J].控制与决策,2012,27(11):1711-1714. 被引量：8

二级参考文献44

1蒋盛益,李庆华.一种基于引力的聚类方法[J].计算机应用,2005,25(2):286-288. 被引量：9
2Chow C K, Zhu H L, Lacy J, et al. A cooperative feature gene extraction algorithm that combines classification and clustering[C]. IEEE Int Conf on Bioinformatics and Biomedicine Workshop. New York: IEEE Press, 2009: 197-202.
3Matsumoto T, Hung E. Fuzzy clustering and relevance ranking of web search results with differentiating clustering label generation[C]. IEEE Int Conf on Fuzzy Systems. New York: IEEE Press, 2010: 1-8.
4Ukkonen A. Clustering algorithms for chains[J]. Machine Learning Research, 2011, 12: 1389-1423.
5Frey B J, Dueck D. Clustering by passing messages between data points[J]. Science, 2007, 315: 972-976.
6Shamir O, Tishby N. Stability and model selection in k-means clustering[J]. Machine Learning. 2010, 80(2/3): 213-243.
7Lingras P, Yan R, West C. Comparison of conventional and rough k-means clustering[C]. Int Conf on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Lecture Notes in Artificial Intelligence. Berlin: Springer, 2003: 130-137.
8Lingras P, West C. Interval set clustering of web users with rough k-means[J]. J of Intelligent Information Systems, 2004, 23(1): 5-16.
9Maji P, Pal S K. Rough set based generalized fuzzy c- means algorithm and quantitative indices[J]. IEEE Trans on Systems, Man, and Cybernetics, Part B: Cybernetics, 2007, 37(6): 1529-1540.
10Peters G. Some refinements of rough k-meansclustering[J]. Pattern Recognition, 2006, 39(8): 1481- 1491.