期刊文献+

改进K均值聚类的不平衡数据欠采样算法 被引量:4

Improved Unbalanced Data Undersampling Algorithm For K-means Clustering
下载PDF
导出
摘要 传统欠采样方法在处理不平衡数据问题时只考虑多数类样本的绝对位置而忽略了其相对位置,从而使产生的平衡数据集存在边界模糊问题。提出一种改进K均值聚类的不平衡数据欠采样算法(UD-PK)。该算法首先利用改进的PSO算法迭代寻找全局最优解作为K-means聚类所需初始值,然后通过K-means进行聚类,再按照每个类别中多数类与少数类的比例定义所取多数类样本个数,并根据多数类样本与簇心距离择优选择参与平衡数据集构造。在UCI数据集上的对比试验表明,该算法在少数类准确率上较一些经典算法有很大提升。 The traditional undersampling method only considers the problem that the absolute position of most class samples ignores its relative position when dealing with the unbalanced data problem,so that the resulting balanced data set has boundary blurring prob⁃lems.This paper proposes an improved unbalanced data undersampling algorithm for K-means clustering(UD-PK).The algorithm first uses the improved PSO algorithm to iteratively find the global optimal solution as the initial value needed for K-means clustering,and clusters by K-means;then according to the ratio of most classes to minority classes in each category the number of samples taken from the majority of the class is defined to participate in the construction of the balanced data set according to the selection of the major⁃ity class sample and the cluster center distance.The comparison experiments on the UCI dataset show that the proposed algorithm has a great improvement in the accuracy of a few classes compared with some classical algorithms.
作者 于艳丽 江开忠 王珂 盛静文 YU Yan-li;JIAN Kai-zhong;WANG Ke;SHENG Jing-wen(School of Mathematics,Physics&Statistics,Shanghai University of Engineering Science;School of Electrical and Electronic Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)
出处 《软件导刊》 2020年第6期205-209,共5页 Software Guide
关键词 不平衡数据集 欠采样算法 K均值聚类 粒子群算法 unbalanced dataset undersampling algorithm K-means cluster PSO
  • 相关文献

参考文献9

二级参考文献48

共引文献298

同被引文献33

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部