期刊文献+

基于推进的非平衡数据分类算法研究

Research on classification algorithms in imbalanced data based on boosting
下载PDF
导出
摘要 在现实世界的数据分类应用中,通常会遇到数据不平衡的问题,即数据中一类数据的数量要大于另一类数据的数量。在目前针对非平衡数据的分类问题的解决方案中,推进算法因其能通过多次迭代提高少数类的分类指标来提高分类器的整体性能而有着较好的应用前景。从分析非平衡数据分类性能差的原因入手,通过抑制过度拟合与对少数类的F度量的控制对经典推进算法进行改进,提出了一种改进算法RIFBoost,然后将算法在WEKA系统上与几个传统的分类算法进行了比较。实验结果表明,RIFBoost算法在保留整体精度的同时对少数类的F度量的性能有了一定的提高。 The application of data classification in reality usually confronts to a problem named imbalanced data that the amount of one class is larger than another class.At the present time,as one of the solutions to classification of imbalanced data,Boosting has a great prospect because the whole performance of classification can be improved by increasing the minority class's FMeasure in the form of iteration.This paper will analyze the reason that the performance of imbalanced data is weak,and improve this classical Boosting algorithm by restraining from overfitting and controlling the F-Measure of minority class,and propose an improved algorithm named RIFBoost,and then compare this new algorithm with some traditional classified algorithms on WEKA system.The experiment result indicates that RIFBoost can increase the F-Measure of minority class while maintaining the whole of classification' s precision.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第25期138-140,156,共4页 Computer Engineering and Applications
关键词 非平衡数据 推进算法 怀卡托智能分析环境(WEKA)系统 F度量 imbalanced data boosting algorithm Waikato Environment for Knowledge Analysis(WEKA) system F-measure
  • 相关文献

参考文献5

  • 1Japkowicz N.The class imbalance problem:A systematic study[J]. Intellignet Data Analysis,2002,6(5) :429-450.
  • 2Japkowicz N.Learning from imbalanced data sets:A comparison of various strategies[R].Learning from Imbalanced Data Sets:The AAAI Workshop.Menlo Park,CA:AAAI Press.WS-00-05,2000:10-15.
  • 3Shapire R E.A brief introduction to boosting[C]//Proceedings of the 16th International Joint Conference on Artificial Intelligence,1999.
  • 4Freund Y,Schapire R E.Experiments with a new boosting algorithm[C]//Machine Learning : Proceedings of the 13th International Conference, 1996:148-156.
  • 5韩慧,王文渊,毛炳寰.不均衡数据集中基于Adaboost的过抽样算法[J].计算机工程,2007,33(10):207-209. 被引量:13

二级参考文献4

  • 1Weiss G Mining with Rarity:A Unifying Framework[C]//Proc.of SIGKDD Explorations,Chicago,IL,USA.2004.
  • 2Schapire R,Singer Y.Improved Boosting Algorithms Using Confidence-rated Predictions[J].Machine Learning,1999,37(3):297-336.
  • 3Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic Minority Over-sampling Technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
  • 4Blake C,Merz C.UCI Repository of Machine Learning Databases[Z].1998.http://www.ics.uci.edu/-mlearn/MLRepository.html.

共引文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部