期刊文献+

基于新型集成分类器的非平衡数据分类关键问题研究 被引量:8

Study on source of classification in imbalanced datasets based on new ensemble classifier
下载PDF
导出
摘要 针对非平衡数据分类问题,提出了一种基于差异采样率的重采样算法(differentiated sampling rate algorithm,DSRA),基于DSRA设计了一种新的集成分类器(SVM-Ripper ensemble classifier,SREC)。SREC采用独特的分类器选择策略、分类器集成策略、分类决策方案,可获得较高的分类精度。同时,利用SREC对影响非平衡数据分类的关键问题进行了研究。结果表明,非平衡数据分类问题本质上是由正负样本类间非平衡、类内非平衡、样本规模以及样本非平衡度等诸多因素引起的,只有综合考虑这些因素才能更好地解决非平衡数据分类问题。 For the issue of classification in imbalanced datasets,this paper presents a new differentiated sampling rate algorithm(DSRA),on this basis,a SVM-Ripper ensemble classifier(SREC) is proposed.SREC employs an unique classifier selection strategy,a novel classifier integration approach and an original classification decision-making method,so that it receives a higher classification accuracy.At the same time,the source of classification in an imbalanced dataset is studied by use of SREC.The simulation results prove that the source of classification in an imbalanced dataset is the aggregation of imbalance between classes,imbalance within a class,sample size as well as the imbalance degree,and only a comprehensive consideration of these factors can better address the issue of classification in imbalanced dataset.
出处 《系统工程与电子技术》 EI CSCD 北大核心 2011年第1期196-201,共6页 Systems Engineering and Electronics
基金 国家自然科学基金(60675030 60875029)资助课题
关键词 数据挖掘 非平衡类数据分类 集成分类器 关键问题 data mining classification in imbalanced datasets ensemble classifier source
  • 相关文献

参考文献20

  • 1Phua C, Alahakoon D. I.ee V. Minority report in fraud detection: classification of skewed data[J]. ACM SIGKDD Explorations Newsletter Special Issue on Learning from Imbalanced Datasets,2004,6(1):50- 59.
  • 2Del Castillo M D, Serrano J I. A multi strategy approach for dig ital text categorization from imbalanced documents[J]. ACM SIGKDD Explorations Newslette Special Issue on Learning from Imbalanced Datasets ,2004 ,6(1) :70 - 79.
  • 3Turney P D. Learning algorithms for keyphrase extraction[J]. Information Retrieval, 2000,2(4) : 303 - 336.
  • 4Ling C X, I.i C. Data mining for direct markeling: problems and solutions[C].// Proc. of 5th International Conference on Knowledge Discovery and Data Mining, 1998 : 73 - 79.
  • 5Weiss G, Provost F. Learning when training data are costly: the effect of class distribution on tree induction[J]. Journal of Artificial Intelligence Research, 2003,19:315-354.
  • 6毕华,梁洪力,王珏.重采样方法与机器学习[J].计算机学报,2009,32(5):862-877. 被引量:36
  • 7Chawla N V, Hall L O, Bowyer K W, et al. SMOTE: synthetic minority oversampling technique[J].Journal of Artificial Intelligence Research , 2002,16:321 - 357.
  • 8Han H, Wang W Y, Mao B H. Borderline smote: a new over-sampling method in imbalanced data sets learning[J].Lecture Notes in Computer Science ,2005,3644:878 - 887.
  • 9Batista G e, Pratti R C, Monard M C. Study of the behavior of several methods for balancing machine learning training data[J]. ACM SIGKDD Explorations Newslette -Special Issue on Learning From Imbalanced Datasets, 2004, 6( 1 ) : 20 - 29.
  • 10Barandela R, Hernandez J K, Sanehez J S, et al. Irnbalanced training set reduction and feature selection through genetic optimization [C].// Proc. of the Conference on Artificial Intelligence Research and Development, 2005 : 215 - 222.

二级参考文献67

  • 1唐伟,周志华.基于Bagging的选择性聚类集成[J].软件学报,2005,16(4):496-502. 被引量:95
  • 2Valiant L G. A theory of learnable. Communications of the ACM, 1984, 27(11): 1134-1142
  • 3Kearns M, Valiant L G. Learning Boolean formulae or finite automata is as hard as factoring. Cambridge, MA: Harvard University Aiken Computation Laboratory. Technical Report TR-14-88, 1988
  • 4Kearns M, Valiant L G. Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM, 1994, 41(1): 67-95
  • 5Schapire R E. The strength of weak learnability. Machine Learning, 1990, 5(2): 197-227
  • 6Dietterich T G. Ensemble methods in machine learning// Proceedings of the Multiple Classifier Systems. Cagliari, Italy, 2000:1-5
  • 7Freund Y, Schapire R E. Experiments with a new Boosting algorithm//Proceedings of the Thirteenth International Conference on Machine Learning (ICML). Bari, Italy, 1996: 148-156
  • 8Breiman L. Prediction games and arcing classifiers. Neural Computation, 1999, 11(7): 1493-1517
  • 9Breiman L. Bagging predictors. Machine Learning, 1996, 24 (2) : 123-140
  • 10Miller R G. The jackknife-a review. Biometrika, 1974, 61 (1) : 1-15

共引文献80

同被引文献79

引证文献8

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部