期刊文献+

类别严重不均衡应用的在线数据流学习算法 被引量:1

Online Data Stream Mining for Seriously Unbalanced Applications
下载PDF
导出
摘要 集成式数据流挖掘是对存在概念漂移的数据流进行学习的重要方法。对于类别分布严重不均衡的应用,集成式数据流挖掘中数据块的学习方式导致样本数多的类别的分类精度高,样本数少的类别的分类精度低的问题,现有算法无法满足此类应用的需求。针对上述问题,对基于回忆机制的集成式数据流学习算法MAE(Memorizing based Adaptive Ensemble)进行改进,提出面向类别严重不均衡应用的在线数据流学习算法UMAE(Unbalanced data Learning based on MAE)。UMAE算法为每个类别设置了一个样本滑动窗口,对于新到达的数据块,其样本依据自身的类别分别进入相应的滑动窗口,最后利用各类别滑动窗口内的样本构建用于在线学习的数据块。与5种典型的数据流挖掘算法的比较结果表明,UMAE算法在满足实时性的同时,不仅整体分类精度高,而且对于样本数很少的小类别的分类精度有大幅度提高;对于异常检测等类别分布严重不均衡的应用,UMAE算法的实用性明显优于其他算法。 Using ensemble of classifiers on sequential blocks of training instances is a popular strategy for data stream mining with concept drifts.Yet for the seriously unbalanced applications where the number of examples for each class in the data blocks is totally different,traditional data block creation will result in low accuracy for the small classes with much less number of instances.This paper provided an updating algorithm UMAE(Unbalanced data learning based on MAE)for seriously unbalanced applications based on MAE(Memorizing based Adaptive Ensemble).UMAE sets an equal-sized sliding window for each class.When each data block comes,each example in the data block comes into the corresponding sliding window based on its classes.During the learning process,a new data block will be created by using the instances in the current sliding windows.This new data block is adopted to generate a new classifier.Compared with five traditional data stream mining approaches,the results show that UMAE achieves high accuracy for seriously unbalanced applications,especially for the small classes with much less number of instances in the applications.
出处 《计算机科学》 CSCD 北大核心 2017年第6期255-259,共5页 Computer Science
基金 国家自然科学基金(61272141 61120106005 61472136) 国防科技大学高性能计算国家重点实验室基金(201513-02)资助
关键词 在线学习 数据流挖掘 回忆与遗忘机制 不均衡数据学习 Online learning Data stream mining Recalling and forgetting mechanisms Unbalanced data learning
  • 相关文献

参考文献2

二级参考文献11

  • 1Dietterich T. Machine Learning Research: Four Current Directions[J]. AI Magazine, 1997, 18(4):97-136.
  • 2Zhou Z-H, Wu J, Tang W. Ensembling Neural Networks: Many Could Be Better Than All[J]. Artificial Intelligence, 2002, 137(1-2) :239-263.
  • 3Caruana R, Niculescu-Mizil A, Crew G, et al. Ensemble Selection from Libraries of Models[C]//Proc of the 21st International Conference on Machine Learning, 2004.
  • 4Martinez-Munoz G, Suarez A. Pruning in Ordered Bagging Ensembles[C]//Proc of the 23rd International Conference in Machine Learning, 2006 : 609-616.
  • 5Martinez-Munoz G, Hernandez Lobato D, Suarez A. An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2), 245-259.
  • 6Lazarevic A, Obradovic Z. The Effieetive Pruning of Neural Network Classifiers[C]//Proc of the 2001 IEEE/INNS In ternational Conference on Neural Networks, 2001:796 801.
  • 7Zhao Q L, Jiang Y H, Xu M. A Fast Ensemble Pruning Algorithm Based on Pattern Mining [J]. Data Mining and Knowledge Discovery, 2009,19 (2) : 277-292.
  • 8Dzeroski S,Zenko B. Is Combining Classifiers Better than Selecting the Best One[C]//Proc of the Nineteenth International Conference on Machine Learning, 2002:123-130.
  • 9Breiman L. Bagging Predictors[J]. Machine Learning, 1996, 24(2) : 123-140.
  • 10赵强利,蒋艳凰,徐明.基于FP-Tree的快速选择性集成算法[J].软件学报,2011,22(4):709-721. 被引量:6

共引文献22

同被引文献7

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部