期刊文献+

非平衡数据集分类方法探讨 被引量:9

Discussion of Classification for Imbalanced Data Sets
下载PDF
导出
摘要 由于数据集中类分布极不平衡,很多分类算法在非平衡数据集上失效,而非平衡数据集中占少数的类在现实生活中通常具有显著意义,因此如何提高非平衡数据集中少数类的分类性能成为近年来研究的热点。详细讨论了非平衡数据集分类问题的本质、影响非平衡数据集分类的因素、非平衡数据集分类通常采用的方法、常用的评估标准以及该问题中存在的问题与挑战。 Because of imbalanced class distribution,most classifiers lose efficiency with it.In fact the rarely occurring class in imbalanced datasets shows statistical significance.The problem of learning from imbalanced datasets has attracted growing attention in recent years.The paper provided a comprehensive review of the classification of imbalanced datasets,the nature of the problem,the factor which affected the problem,the current assessment metrics used to evalua-te learning performance,as well as the opportunities and challenges in the learning from imbalanced data.
出处 《计算机科学》 CSCD 北大核心 2012年第B06期304-308,共5页 Computer Science
基金 国家自然科学基金项目(60773048)资助
关键词 非平衡数据集 分类 抽样技术 代价敏感学习 Imbalanced data sets; Classification; Sampling methods; Cost-sensitive learning
  • 相关文献

参考文献27

  • 1Tan Pang-ning, Steinbach M. Introduction to Data Mining(第2版)[M].范明,范宏建,译.北京:人民邮电出版社,2011:127-187.
  • 2Sun Yan-min,Kamel M S,Wong A K C. Cost-sensitive boosting for classification of imbalanced data. Patter Recognition Society [J]. Published by Elsevier Ltd, 2007:3358-3378.
  • 3He Hai-bo, Garcia E A. Learning from imbalanced Data [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9):1263-1284.
  • 4Visa S,Ralescu A. Issues in Mining imbalaneed Data Sets-A Review Paper[C]//Proc. of MidWest Artificial Intelligence and Cognitive Science Conference (MAICS'05). Dayon, 2005: 67-73.
  • 5Batista G E A P A,Prati R C,Monard M C. A study of the Behavior of several methods for balancing machine learning training data [J]. SIGKDD Explorations Special Issue on Learning from Imbalaneed Datasets, 2004,6 (1) : 20-29.
  • 6Japkowicz N, Stepen S. The class imbalance problem: a systematic study[J]. Intell. Data Anal. J. , 2002,6 (5): 429-450.
  • 7Weiss G,Provost F. Learning when training data are costly: the effect of class distribution on tree induction[J]. J. Aritif. Intell. Res. ,2003,19:315-354.
  • 8Joshi M V. Learning classifier models for predicting rare phenomena[D]. University of Minnesota, Twin Cites, MN, USA, 2002.
  • 9Japkowiez N, Stephen S. The class imbalance problem: a systematic study[J]. Intell. Data Anal. J., 2002,6(5): 429-450.
  • 10Japkowicz N. Concept-learning in the presence of between-class and within-elass imbalance[C] //Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligenee. Ottawa,Canada,June 2001: 67-77.

二级参考文献9

  • 11.Valiant L G.A Theory of Learnable.Communication of ACM,1984; 27:1134-1142
  • 22.Kearns M,Valiant L G.Learning Boolean Formulae or Factoring.Te- chnical Report TR-1488,Cambridge,MA:Havard University Aiken Computation Laboratory,1988
  • 33.Kearns M,Valiant L G.Crytographic Limitation on Learning Boolean Formulae and Finite Automata.In:Proceedings of the 21st Annual ACM Symposium on Theory of ComputingNew YorkNY:ACM press, 1989:433-444
  • 44.Schapire R E.The Strength of Weak Learnability.Machine Learning, 1990;5:197-227
  • 55.Freund Y.Boosting a Weak Algorithm by Majority.Information and Computation,1995;121(2):256-285
  • 66.Freund Y,Schapire R E.A Decision-Theoretic Generalization of On- Line Learning and an Application to Boosting.Journal of Computer and System Sciences,1997;55(1):119-139
  • 78.Schapire R EFreund YBartlett Y,et al.Boosting the Margin:A New Explanation for the Effectiveness of Voting Methods.The Annals of Statistics,1998;26(5):1651-1686
  • 89.Schapire R E.A Brief Introduction of Boosting.InProceedings of the 16th International Joint Conference on Artificial Intelligence,1999
  • 910.Schapire R E.A Brief Introduction of Boosting. In: Proceedings of the 16th International joint Conference on Artificial Intelligence1999

共引文献64

同被引文献68

引证文献9

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部