期刊文献+

一种抗噪的概念漂移数据流分类方法 被引量:1

A classification approach for learning concept-drift in noisy data stream
下载PDF
导出
摘要 隐含概念漂移的数据流分类问题是数据挖掘领域研究的热点之一,而实际数据中的噪音会直接影响概念漂移检测及分类质量,因此具有良好抗噪性能的数据流分类方法具有重要的研究和应用价值.随机决策树的集成模型是一种有效的数据流分类模型,为此本文基于随机决策树,引入Hoeffding Bounds不等式来检测和区分概念漂移和噪音,根据检测结果动态调整滑动窗口的大小和漂移检测周期,并提出一种增量式的集成分类方法ICDC,实验结果表明,本文算法在含噪音数据流上处理概念漂移是有效的. Classification of data streams with concept drift has become one of hot research spots.However,noise in real data directly affects the result of detection of concept drift and the quality of classification.Therefore,an anti-noise approach is of important value for research and application.Based on the ensemble random decision tree,an effective classification model for stream classification,an incremental approach ICDC was proposed by introducing the Hoeffding Bounds inequality to distinguish concept drift and noise in classification,which adjusts the period of detection and window size for training data in accordance with the detection results.Extensive studies on synthetic and real streaming databases demonstrate that ICDC performs quite effectively compared with several known single or ensemble online algorithms.
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2011年第4期347-352,共6页 JUSTC
基金 国家重点基础研究发展(973)计划(2009CB326203) 国家自然科学基金(60975034) 安徽省自然科学基金(090412044)资助
关键词 数据挖掘 数据流 随机决策树 概念漂移 噪音 data mining data stream random decision tree concept drift noise
  • 相关文献

参考文献18

  • 1Golab L,(O)zsu M T.Issues in data stream management[J].ACM SIGMOD Record,2003,32(2):5-14.
  • 2Zhu Y Y,Shasha D.Efficient elastic burst detection in data streams[C] // Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data mining.New York,USA:ACM Press,2003,336-345.
  • 3Li P P,Hu X G,Wu X D.Mining concept-drifting data streams with multiple semi-random decision trees[C] //Proceedings of the 4th International Conference on Advanced Data Mining and Applications.Heidelberg,Germany:Springer,2008:733-740.
  • 4Ho T K.The random subspace method for constructing decision forests[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(8):832-844.
  • 5Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32.
  • 6Fan W.On the optimality of probability estimation by random decision trees[C] // Proceedings of the 9th National Conference on Artificial Intelligence.San Jose,USA:AAAI Press,2004:336-341.
  • 7Hulten G,Spencer L,Domingos P.Mining timechanging data streams[C] // Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM Press,2001:97-106.
  • 8Gama J,Medas P,Castillo G,et al.Learning with drift detection[J].SBIA Brazilian Symposium on Artificial Intelligence,2004,3171(17):286-295.
  • 9Fan W.Strearnminer:a classifier ensemble-based engine to mine concept drifting data streams[C] //Proceedings of the 30th International Conference on Very Large Data Bases.Toronto,Canada:VLDB Endowment,2004:1 257-1 260.
  • 10Abdulsalam H,Skillicom D B,Martin P.Streaming random forests[C] // Proceedings of the llth International Database Engineering and Applications Symposium.Washington,USA:IEEE Computer Society,2007:225-232.

二级参考文献2

共引文献3

同被引文献6

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部