期刊文献+

用于不均衡数据集分类的KNN算法 被引量:9

Improved KNN algorithm in classification of imbalanced data sets
下载PDF
导出
摘要 针对KNN在处理不均衡数据集时,少数类分类精度不高的问题,提出了一种改进的算法G-KNN。该算法对少数类样本使用交叉算子和变异算子生成部分新的少数类样本,若新生成的少数类样本到父代样本的欧几里德距离小于父代少数类之间的最大距离,则认为是有效样本,并把这类样本加入到下轮产生少数类的过程中。在UCI数据集上进行测试,实验结果表明,该方法与KNN算法中应用随机抽样相比,在提高少数类的分类精度方面取得了较好的效果。 When the KNN algorithm is used to deal with imbalanced data sets, it has poor performance in the minority class prediction accuracy.An improved algorithm(G-KNN) is proposed to solve this problem.For the minority class samples, this algorithm uses the crossover operator and mutation operator to generate some of the new minority class samples.One new sample is considered valid, only if its Euclidean distance to parent is less than the maximum distance between parents. Then this valid sample is used to product the minority class samples in the next round of the process.The exper/mental results,which are tested on the UCI data sets,show that this algorithm is superior to KNN algorithm in the application of random over-sampling in improving the classification accuracy of the minority class.
出处 《计算机工程与应用》 CSCD 北大核心 2011年第28期143-145,236,共4页 Computer Engineering and Applications
基金 山东省自然科学基金(No.ZR2010FM021) 山东省科技研究计划项目(No.2007ZZ17 No.2008GG10001015 No.2008B0026) 山东省教育厅科研项目(No.J09LG02)
关键词 不均衡数据集 K最近邻居(KNN)算法 过抽样 交叉算子 imbalanced data sets K-Nearest Neighbor (KNN) algorithm over-sampling crossover
  • 相关文献

参考文献12

  • 1Weiss G M.Mining with rarity: a unifying framework[J].SIGKDD Explorations,2004,6( 1 ) :7-19.
  • 2Ciraco M,Rogalewski M, Weiss G M.Improving classifier utility by altering the misclassification cost ration[C]//Proc of the Ist International Workshop on Utility-based Data Mining.New York: ACM, 2005 : 46-52.
  • 3Fan W, Stolfo S J, Zhang J, et al.AdaCost: misclassication cost-sensitive boosting[C]//Proc of the 16th International Conference on Machine Leaming.[S.l.]:Morgan Kaufmanm, 1999:97-105.
  • 4Manevitz L M,Yousef M.One-class SVMs for document classification[J].Joumal of Machine Learning Research,2001,2(2) : 139-154.
  • 5Kubat M, Matwin S.Addressing the course of imbalanced training sets: one-sided selection[C]//Proc of the 14th International Conference on Machine Learning, San Francisco, CA, 1997:179-186.
  • 6Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence and Research, 2002: 321-357.
  • 7Dasarathy B V.Nearest Neighbor(NN)norms:NN pattern classification techniques[M].Los Alamitos, California: IEEE Computer Society Press, 1991.
  • 8Joshi M V.On evaluating performance of classifiers for rare classes[C]//Proc of the 2nd IEEE International Conference on Data Mining, Maebashi, Japan, 2002: 641-644.
  • 9Mitchel TM.机器学习[M].曾华军,张银奎译.北京:机械工业出版社,2003.
  • 10王小平 曹立明.遗传算法-理论、应用与软件实现[M].西安:西安交通大学出版社,2003..

二级参考文献19

共引文献58

同被引文献78

  • 1张宁,贾自艳,史忠植.使用KNN算法的文本分类[J].计算机工程,2005,31(8):171-172. 被引量:99
  • 2张英,苏宏业,褚健.基于模糊最小二乘支持向量机的软测量建模[J].控制与决策,2005,20(6):621-624. 被引量:27
  • 3韩慧,王路,温明,王文渊.不均衡数据集学习中基于初分类的过抽样算法[J].计算机应用,2006,26(8):1894-1897. 被引量:11
  • 4陈爱军,宋执环,李平.基于矢量基学习的最小二乘支持向量机建模[J].控制理论与应用,2007,24(1):1-5. 被引量:21
  • 5VapnikVN.统计学习理论的本质[M].北京:清华大学出版社,2000..
  • 6PROBOST F. Machine learning from imbalanced data sets 101 [C] I I Proc of AAAI Workshop on Imbalanced Data Sets. 2000.
  • 7CHAWLA N V, JAPKOWICZ N, KOTCA A. Editorial: special issue on learning from imbalanced data sets [ J]. SIGKDD Explorations, 2004,6(1) :1-6.
  • 8CHEN Lei-chen, CAl Zhi-hua, CHEN Lu, et al. A novel differential evolution-clustering hybrid resampling algorithm on imbalanced datasets [ C ] IIProc of the 3 rd International Conference on Knowledge Discovery and Data Mining. 2010: 81- 85.
  • 9CHAWLA N V, BOWYER K W, HALL L 0, et al. SMOTE: synthetic minority over-sampling technique [J]. Journal of Articial Intelligence Research ,2002 ,16(1) :321-357.
  • 10HAN Hui,WANG Wen-yuan, MAO Bing-huan. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning [ C ] II Lecture Notes in Computer Science, vol 3644. Berlin: Springer-Verlag,2oo5:878-887.

引证文献9

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部