期刊文献+

具有回忆和遗忘机制的数据流挖掘模型与算法 被引量:15

Ensemble Model and Algorithm with Recalling and Forgetting Mechanisms for Data Stream Mining
下载PDF
导出
摘要 集成式数据流挖掘是对存在概念漂移的数据流进行学习的重要方法.针对传统集成式数据流挖掘存在的缺陷,将人类的回忆和遗忘机制引入到数据流挖掘中,提出基于记忆的数据流挖掘模型MDSM(memorizing based data stream mining).该模型将基分类器看作是系统获得的知识,通过"回忆与遗忘"机制,不仅使历史上有用的基分类器因记忆强度高而保存在"记忆库"中,提高预测的稳定性,而且从"记忆库"中选取当前分类效果好的基分类器参与集成预测,以提高对概念变化的适应能力.基于MDSM模型,提出了一种集成式数据流挖掘算法MAE(memorizing based adaptive ensemble),该算法利用Ebbinghaus遗忘曲线对系统的遗忘机制进行设计,并利用选择性集成来模拟人类的"回忆"机制.与4种典型的数据流挖掘算法进行比较,结果表明:MAE算法分类精度高,对概念漂移的整体适应能力强,尤其对重复出现的概念漂移以及实际应用中存在的复杂概念漂移具有很好的适应能力.不仅能够快速适应新的概念变化,并且能够有效抵御随机的概念波动对系统性能的影响. Using ensemble of classifiers on sequential chunks of training instances is a popular strategy for data stream mining with concept drifts. Aiming at the limitations of existing approaches, this paper introduces human recalling and forgetting mechanisms into a data stream mining system, and proposes a memorizing based data stream mining (MDSM) model. The model considers base classifiers as learned knowledge. Through "recalling and forgetting" mechanism, most useful classifiers in the past will be reserved in a "memory repository", which improves the stability under random concept drifts. The best classifiers for the current data chunk are selected for prediction, which achieves high adaptability for different concept drifts. Based on MSDM, the paper puts forward a new algorithm MAE (memorizing based adaptive ensemble). MAE uses Ebbinghans forgetting curve as forgetting mechanism and adopts ensemble pruning to emulate the "recalling" mechanism. Compared with four traditional data stream mining approaches, the results show that MAE achieves high and stable accuracy with moderate training time. The results also proved that MAE has good adaptability for different kinds of concept drifts, especially for the applications with recurring or complex concept drifts.
出处 《软件学报》 EI CSCD 北大核心 2015年第10期2567-2580,共14页 Journal of Software
基金 国家自然科学基金(61272141 60905032 61120106005 61273232)
关键词 数据流挖掘 概念漂移 回忆与遗忘 Ebbinghaus遗忘曲线 选择性集成 data stream mining concept drift recalling and forgetting Ebbinghaus forgetting curve ensemble pruning
  • 相关文献

参考文献2

二级参考文献10

  • 1蒋艳凰,赵强利,杨学军.一种搜索编码法及其在监督分类中的应用[J].软件学报,2005,16(6):1081-1089. 被引量:13
  • 2Dietterich T. Machine Learning Research: Four Current Directions[J]. AI Magazine, 1997, 18(4):97-136.
  • 3Zhou Z-H, Wu J, Tang W. Ensembling Neural Networks: Many Could Be Better Than All[J]. Artificial Intelligence, 2002, 137(1-2) :239-263.
  • 4Caruana R, Niculescu-Mizil A, Crew G, et al. Ensemble Selection from Libraries of Models[C]//Proc of the 21st International Conference on Machine Learning, 2004.
  • 5Martinez-Munoz G, Suarez A. Pruning in Ordered Bagging Ensembles[C]//Proc of the 23rd International Conference in Machine Learning, 2006 : 609-616.
  • 6Martinez-Munoz G, Hernandez Lobato D, Suarez A. An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2), 245-259.
  • 7Lazarevic A, Obradovic Z. The Effieetive Pruning of Neural Network Classifiers[C]//Proc of the 2001 IEEE/INNS In ternational Conference on Neural Networks, 2001:796 801.
  • 8Zhao Q L, Jiang Y H, Xu M. A Fast Ensemble Pruning Algorithm Based on Pattern Mining [J]. Data Mining and Knowledge Discovery, 2009,19 (2) : 277-292.
  • 9Dzeroski S,Zenko B. Is Combining Classifiers Better than Selecting the Best One[C]//Proc of the Nineteenth International Conference on Machine Learning, 2002:123-130.
  • 10Breiman L. Bagging Predictors[J]. Machine Learning, 1996, 24(2) : 123-140.

共引文献12

同被引文献74

引证文献15

二级引证文献114

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部