期刊文献+

数据流上概念漂移的检测和分类 被引量:9

Detecting Concept Drift and Classifying Data Streams
下载PDF
导出
摘要 挖掘带有概念漂移的数据流对于许多实时决策是十分重要的.本文使用统计学理论估计某一确定模型在最新概念上的真实错误率的置信区间,在一定概率保证下检测数据流中是否发生了概念漂移,并将此方法和KMM(核平均匹配)算法引入集成分类器框架中,提出一种数据流分类的新算法WSEC.在仿真和真实数据流上的试验结果表明该算法是有效的. It is very important to mining data streams with concept drifts for many real-time decision support systems. This paper proposed a method to estimate the Confidence Interval of the true error rate of the Up-to-Date concept to a certain model based on the sta- tistical theory. This method could detect the concept drift under a certain probability guarantee. We apply this method and KMM algorithm to the Ensemble Framework of Classifier, and give a new algorithm for data stream classification. The experimental results in the simulation and real data streams show that the algorithm is effective.
出处 《小型微型计算机系统》 CSCD 北大核心 2011年第3期421-425,共5页 Journal of Chinese Computer Systems
基金 河南省自然科学基金项目(2009A520025)资助
关键词 概念漂移 数据流挖掘 分类 集成 concept drift data streams mining classifying ensemble
  • 相关文献

参考文献15

  • 1Gama J Rocha R, Medas P. Accurate decision trees for mining high-speed data streams [ C]. Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining, Washington, D. C, 2003, 523-528.
  • 2Wang H, Fan W, Yu P S, et al. Mining concept-drifting data streams using ensemble classifiers [ C]. Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining, 2003, 226-235.
  • 3Zico Kolter J, Marcus A Maloof. Dynamic weighted majority: an ensemble method for drifting concepts [ J]. The Journal of Machine Learning Research, 2007, 8:2755-2790.
  • 4Hulten G, Spencer L, Domingos P. Mining time-changing data streams [ C]. Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, SanFrancisco, CA, 2001,97-106.
  • 5Aggarwal C C,Han J, Wang J, et al. On demand classification of data streams [C]. Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA , 2004, 503-508.
  • 6lan H Witten, Eibe Frank. Data mining :practical machine learning tools and techniques, second edition [M]. San Francisco, CA: Morgan Kaufmann Publishers Inc, 2005.
  • 7Gao J, Fan W, Hart J,et al. A general framework formining concept-drifting data streams with skewed distributions [ C]. Proceedings of the Seventh International Conference on Data Mining, 2007.
  • 8Widmer G, Kubat M. Effective learning in dynamic environments by explicit context tracking [C]. Proceedings of 6th European Conference on Machine Learning, 1993, 227-243.
  • 9Zhang Peng, Zhu Xing-quan, Shi Yong. Categorizing and mining concept drifting data streams [ C]. Proceedings of the 14th International Conference on Knowledge Discovery and Data Mining, Las Vegas, 2008,812-820.
  • 10Mahbod Tavallaee, Ebrahim Bagheri, Lu Wei,et al. A detailed analysis of the KDD CUP 99 data [ C]. Proceedings of IEEE Symposium: Computational Intelligence for Security and Defence Applications, 2009.

二级参考文献12

  • 1Xu Zhi-wei, Bu Guan-Ying. A theorem on grid access control[C]. In: Proc. of J. Comput. Sci. & Technol. , July 2003, 18(4):515-522.
  • 2Domenico Talia. Tools and services for distributed knowledge discovery on grids [C]. In HPC 2002 Cetraro, Italy, June, 2002:24-27.
  • 3Hai Zhuge. China's e-science knowledge grid environment[C].In: Proc. of the IEEE Computer Society, February 2004: 13-17.
  • 4Cannataro M, Talia D. Knowledge grid: an architecture for distrubuted knowledge discovery[C]. In: CACM, January 2003, 46(1):89-93.
  • 5Agrawal R, Sharer J. Parallel mining of association rules[J].IEEE Transactions on Knowledge and Data Engineering, 1996,8(6):962-969.
  • 6Cheung D W, Han J, Ng V. A fast distributed algorithm for mining association rules[C]. In: Proc. of 1996 Int'l Conf. on Parallel and Distributed Information Systems, Miami Beach,Florida, December 1996: 31-44.
  • 7Schuster A, Wolff R. Communication-efficient distributed mining of association rules[C]. In: Proe. of the ACM SIGMOD Int'1. Conference on Management of Data, Santa Barbara, California May,2001:473-484.
  • 8Toivonen H. Sampling large databases for association rules[C]. In: Proceeding of the 22nd VLDB Conference Mumbai (Bombay), India, 1996,134-145.
  • 9Brin S, Motwani R, Ullman J D, et al. Dynamic itemset counting and implication rules for market basket data[C]. In ACM SIGMOD Int'l Conference On the Management of Data, June 1997: 255-264.
  • 10Jiawei Han, Jianpei. Mining frequent patterns without candidate generation: a frequent-pattern tree approach[C]. In: Proc. of the Data Mining and Knowledge Discovery, 8, 2004: 53-87.

共引文献8

同被引文献72

  • 1穆国旺,臧婷,赵罡.用改进遗传算法确定B样条曲线的节点矢量[J].计算机工程与应用,2006,42(11):88-90. 被引量:9
  • 2赵辉,王黎明.一个基于网格服务的分布式关联规则挖掘算法[J].小型微型计算机系统,2006,27(8):1544-1548. 被引量:9
  • 3Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems[ C]. Proceedings of the 21th ACM SIGMOD-SI- GACT-SIGART Symposium on Principles of Database Systems, ACM, 2002 : 1-16.
  • 4Tsymbal A. The problem of concept drift: definitions and related work [ D]. TCD-CS-2004-15, Ireland: Trinity College Dublin, Department of Computer Science, 2004.
  • 5Hulten G, Spencer L, Domingos P. Mining time-changing data streams[C]. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2001 : 97-106.
  • 6KlinKenberg R. Learning drifting concepts: examples selections vs. example weighting [ J ]. Intelligent Data Analysis, 2004, 8 (3) :281-300.
  • 7Wang H, Fan W, Yu P S, et al. Mining concept-drifting data streams using ensemble classifiers [ C ]. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2003 : 226-235.
  • 8Masud M M, Gao Jing, Han Jia-wei, et al. Classification and no- vel class detection in concept-drifting data streams under time con- straints[J]. IEEE Transactions on Knowledge and Data Engineer- ing, 2011, 23(6) :859-874.
  • 9Zhang Peng, Zhu Xing-quan, Tan Jian-long, et al. Classifier and cluster ensembles for mining concept drifting data streams [ C ]. In Data Mining (ICDM), 2010, IEEE 10th International Conference on Data Ming, IEEE, 2010: 1175-1180.
  • 10Aggarwal C C. A framework for diagnosing changes in evolving data streams[ C]. Proceeding of the 2003 ACM SIGMOD Interna- tional Conference on Management of Data, ACM, 2003 : 575-586.

引证文献9

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部