期刊文献+

IKnnM-DHecoc:一种解决概念漂移问题的方法 被引量:13

IKnnM-DHecoc:A Method for Handling the Problem of Concept Drift
下载PDF
导出
摘要 随着数据流挖掘的应用日趋广泛,带概念漂移的数据流分类问题已成为一项重要且充满挑战的工作.根据带概念漂移的数据流的特点,一个有效的学习器必须能跟踪并快速适应这种变化.一种基于增量KnnModel的动态层次编码算法被提出用于解决数据流的概念漂移问题.在将数据流划分为数据块后,根据增量KnnModel算法对每块的预学习结果构建并更新类别层次树、层次编码,用可增量学习的分类算法对照编码划分进行学习,并生成备选分类器集.最后依据活跃度对结点进行剪枝处理以减少计算代价.在预测阶段,利用增量KnnModel算法和动态层次纠错输出编码算法的各自优势进行联合预测.实验结果表明:基于增量KnnModel算法的动态层次纠错输出编码算法不但能够提高模型学习的动态性和分类的正确性,而且还能够快速适应概念漂移的情况. With the extensive applications of data stream mining,the classification of concept-drifting data streams has become more and more important and challenging.Due to the characteristics of data streams with concept-drifting,an effective learner should be able to track such changes and to quickly adapt to them.A method named dynamic hierarchical ECOC algorithm based on incremental KnnModel(IKnnM-DHecoc) for handling the problem of concept drift is proposed.It divides a given data stream into several data blocks,and then learns from each data block by using incremental KnnModel algorithm.Based on the outcomes of pre-learning,a hierarchical tree together with a hierarchical coding matrix are built and updated,from which a chosen incremental learning method is used for training in order to build a set of classifier and a set of classifier candidates.Moreover,a pruning strategy for generated nodes of hierarchical tree is proposed to reduce computational cost by taking account of each node's activity.In testing phase,a combination scheme of taking advantage of both IKnnModel and DHecoc is used for prediction.Experimental results show that the proposed IKnnM-DHecoc algorithm not only improves the dynamic nature of learning and classification performance,but could quickly adapt to the situation of concept drift.
出处 《计算机研究与发展》 EI CSCD 北大核心 2011年第4期592-601,共10页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61070062) 教育部回国留学人员基金项目(教外司留[2008]890号)
关键词 概念漂移 数据流 纠错输出编码 增量Knn模型 分类 concept drift data stream error correcting output code incremental KnnModel classification
  • 相关文献

参考文献23

  • 1Folino G, Pizzuti C, Spezzano G. An adaptive distributed ensemble approach to mine concept-drifting data streams [C]//Proc of the 19th IEEE Int Conf on Tools with Artificial Intelligence. Piseataway, NJ: IEEE, 2007:183-188.
  • 2Wang Haixun, Fan Wei, Yu P S, et al. Mining concept- drifting data streams using ensemble elassifiers[C] //Proe of the 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2003:226-235.
  • 3Tsymbal A. The problem of concept drift: Definitions and related work, TCD-CS-2004-15 [R]. Dublin, Ireland.. Department of Computer Science, Trinity College, 2004.
  • 4Hulten G, Spencer L, Domingos P. Mining time-changing data streams[C]//Proc of the 7th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2001:97-106.
  • 5Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems[C] //Proc of the 21st ACM SIGACT- SIGMOD-SIGART Syrup on Principles of Database Systems. New York: ACM, 2002:1-16.
  • 6Widmer G, Kubat M. Learning in the presence of concept drift and hidden contexts[J]. Machine Learning, 1996, 23 (1) : 69-101.
  • 7Domingos P, Hulten G. Mining high-speed data streams[C] //Proc of the 6th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2000:71-80.
  • 8Gama J, Rocha R, Medas P. Accurate decision trees for mining high-speed data streams[C] //Proc of the 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2003:523-528.
  • 9Gama J, Medas P, Rocha R. Forest trees for on-line data[C] //Proc of the 19th ACM Symp on Applied Computing. New York: ACM, 2004:632-636.
  • 10Gama J, Castillo G. Learning with local drift detection[G]// LNAI 4093: Proe of the 2nd Inf Conf on Advanced Data Mining and Applieations. Berlin: Springer, 2006:42-55.

二级参考文献57

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测[J].计算机研究与发展,2005,42(9):1498-1504. 被引量:8
  • 3孙玉芬,卢炎生.流数据挖掘综述[J].计算机科学,2007,34(1):1-5. 被引量:36
  • 4钱江波,徐宏炳,董逸生,王永利,刘学军,杨雪梅.基于最小生成树的数据流窗口连接优化算法[J].计算机研究与发展,2007,44(6):1000-1007. 被引量:3
  • 5GHANI R. Combining labeled and unlabeled data for multiclass text categorization [ C]// ICML: Proceedings of the 19th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann, 2002:187 - 194.
  • 6WINDEATr T, ARDESHIR G. Boosted ECOC ensembles for face recognition [ C]//VIE 2003: International Conference on Visual Information Engineering. Washington, DC: IEEE Press, 2003: 165- 168.
  • 7ZHOU J, SUEN C Y. Unconstrained numeral pair recognition using enhanced error correcting output coding: A holistic approach [ C]// Proceedings of the 8th International Conference on Document Analysis and Recognition. Washington, DC: IEEE Computer Society, 2005, 1:484-488.
  • 8LUO DI-JUN, XIONG RONG. Distance function learning in errorcorrecting output coding framework [ C]// ICONIP 2006: Proceeding of the 13th International Conference on Neural Information Proceeding, LNCS 4233. Berlin: Springer-Verlag, 2006: 1-10.
  • 9ALLWEIN E L, SHAPIRE R E, SINGER Y. Reducing multiclass to binary: A unifying approach for margin classfiers[ J]. Journal of Machine Learning Research, 2002, 1:113 - 141.
  • 10PASSERINI A, PONTIL M, FRASCONI P. New results on error correcting codes of kernel machines[ J]. IEEE Transactions on Neural Networks, 2004, 15(1): 45-54.

共引文献47

同被引文献249

  • 1刘耀宗,王永利,刘凤玉,张宏.一种自适应概念变化的数据流分类器[J].计算机研究与发展,2007,44(z2):63-68. 被引量:1
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:376
  • 3程昳,苗夺谦,冯琴荣.基于模糊粗糙集的粒度计算[J].计算机科学,2007,34(7):142-145. 被引量:4
  • 4富春岩,葛茂松.一种能够适应概念漂移变化的数据流分类方法[J].智能系统学报,2007,2(4):86-91. 被引量:5
  • 5曾华军,张银奎.机器学习[M].北京:机械工业出版社,2003:60—79.
  • 6MASUD M M, GAO J, KHAN L, et al. Mining concept-drifting data stream to detect peer to peer botnet traffic[EB/OL].[2012-01-04]. http://www.utdallas.edu/~mmm058000/reports/UTDCS-05-08.pdf.
  • 7CRUPI V, GUGLIEMINO E, MILAZZO G. Neural-network-based system for novel fault detection in rotating machinery[J].Journal of Vibration and Control, 2004, 10(8): 1137-1150.
  • 8DELANY S J, CUNNINGHAM P, TSYMBAL A. A comparison of ensemble and case-base maintenance techniques for handing concept drift in spam filtering[C] // FLAIRS'2006: Proceedings of 19th International Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2006: 340-345.
  • 9MASUD M M, GAO J, KHAN L, et al. A practical approach to classify evolving data streams: Training with limited amount of labeled data[C] // ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining. Washington, DC: IEEE Computer Society, 2008:929-934.
  • 10WIDMER G,KUBAT M.Learning in the presence of concept drift and hidden contexts[J] .Machine Learning,1996,23(1):69-101.

引证文献13

二级引证文献72

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部