期刊文献+

一种基于欠采样的不平衡数据分类算法 被引量:20

Imbalanced Data Classification Algorithm Based on Undersampling
下载PDF
导出
摘要 针对不平衡数据学习问题,提出一种基于欠采样的分类算法。对多数类样例进行欠采样,保留位于分类边界附近的多数类样例。以AUC为优化目标,选择最恰当的邻域半径使数据达到平衡,利用欠采样后的样例训练贝叶斯分类器,并采用AUC评价分类器性能。仿真数据及UCI数据集上的实验结果表明,该算法有效。 Imbalanced Data Learning(IDL) problem is one of the research issues in machine learning.This paper presents a classification algorithm based on undersampling,which algorithm undersamples the majority examples,and retains the majority examples near the classify border.With the AUC as the optimization objectives.It chooses the most appropriate domain radius to balance the data set,and trains the Bayesian classifier by the use of the examples after undersampling.Using AUC as a measure of classifier performance evaluation,the experiments on simulation data and UCI data sets show that undersampling is effective
出处 《计算机工程》 CAS CSCD 北大核心 2011年第13期147-149,共3页 Computer Engineering
基金 国家科技支撑计划基金资助项目(2006BAK01A33) 公安部重点科研基金资助项目(B类)(20032252001) 吉林省科技发展计划基金资助项目(20070321 20090704)
关键词 机器学习 分类算法 不平衡数据 欠采样 邻域 machine learning classification algorithm imbalanced data undersampling neighborhood
  • 相关文献

参考文献6

  • 1He Haibo, Edwardo A. Learning from Imbalanced Data[J]. IEEE Trans. on Knowledge and Data Engineering, 2009, 21(9): 1263- 1284.
  • 2Chawla N V, Japkowicz N, Kolcz A. Editorial: Special Issue on Learning from Imbalanced Data Sets[J]. SIGKDD Explorations,2004, 6(1): 1-6.
  • 3Batista G E A, Prati R C, Monard M C. A Study of the Behavior of Several Methods for Balancing Machine Learning TrainingData[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 20-29.
  • 4郭虎升,亓慧,王文剑.处理非平衡数据的粒度SVM学习算法[J].计算机工程,2010,36(2):181-183. 被引量:15
  • 5Fawcett T. An Introduction to ROC Analysis[J]. Pattern Recognition Letters, 2006, 27(8): 861-874.
  • 6Tan P N, Steinbach M, Kumar V. Introduction to Data Mining[M]. Boston, Massachusetts, USA: Addison Wesley, 2005.

二级参考文献5

  • 1Vapnik V. Statictical Learning Theory[M]. New York, USA: Wiley, 1998.
  • 2Tang Yuchun. Granular Support Vector Machines Based on Granular Computing, Soft Computing and Statistical Learning[D]. Atlanta, USA: Georgia Stage University, 2006.
  • 3Yao Y Y. On Modeling Data Mining with Granular Computing[C]// Proc. of the 25th Annual International Conference on Computer Software and Applications. Chicago, USA: [s. n.], 2001.
  • 4Kubat M, Matwin S. Addressing the Curse of Imbalanced Training Sets: One-sided Selection[C]//Proc. of the 14th International Conference on Machine Learning. Nashville, Tennessee, USA: [s. n.], 1997.
  • 5蒋莎,张晓龙.一种用于非平衡数据的SVM学习算法[J].计算机工程,2008,34(20):198-199. 被引量:7

共引文献14

同被引文献139

引证文献20

二级引证文献121

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部