期刊文献+

基于鞅的数据流概念漂移检测方法 被引量:3

Method of Concept Drifting Detection Based on Martingale in Data Stream
下载PDF
导出
摘要 近年来,对数据流中概念漂移的检测成为了研究热点.本文依据统计学理论提出基于鞅的数据流概念漂移检测方法(CDDBM),首先综合考虑数据分布质心和半径改变引起概念的漂移,提出有效的相异度量方法,然后对数据流采用双向统计的方法更准确地标识数据分布并映射到均匀分布序列,最后计算双重随机幂鞅的均值,并利用停时定理来判断数据流中是否有概念漂移发生.另外,检测方法中,使用合理的阈值参数控制鞅变化的峰值,从而有效地降低了误报率和失报率,提出适当尺寸的窗口有效地应对数据流的无限性且更准确地推断漂移区间.在人工和真实数据流上的实验结果表明,该方法在数值型数据、分类型数据和混合型数据上都是有效的,并能够有效地控制错误率. Recently the research of concept drifting in data stream has been turned into a hot topic. This paper proposes a method of concept drifting detection based on martingale ( CDDBM ) in data stream according to statistical theory, firstly proposes an effective strangeness measure considering reasons of concept drifting including the centroid and radius of the data distribution, then adopts double-sided to statistics for labeling the data distribution and mapping to the uniform distribution accurately, lastly determines wheth- er a concept drifting happens in the light of the mean of double-sided randomized power martingale and the theory of stop time. In ad- dition, the method of detection uses the suited threshold to control the peak of martingale for reducing the false alarm rate and the lost alarm rate, uses appropriate size of window to deal with the unlimited stream effectively and deduces the range of the position of con- cept drifting using the window. The experimental results in artificial and real world data sets show that the method is effective in data streams including numerical, categorical and mixed-attribute, and effectively controls the error rate.
出处 《小型微型计算机系统》 CSCD 北大核心 2013年第8期1787-1792,共6页 Journal of Chinese Computer Systems
关键词 概念漂移 数据流 可交换性 相异度量 concept drifting martingale data stream exchangeability strangeness measure
  • 相关文献

参考文献18

  • 1Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems[ C]. Proceedings of the 21th ACM SIGMOD-SI- GACT-SIGART Symposium on Principles of Database Systems, ACM, 2002 : 1-16.
  • 2Tsymbal A. The problem of concept drift: definitions and related work [ D]. TCD-CS-2004-15, Ireland: Trinity College Dublin, Department of Computer Science, 2004.
  • 3Hulten G, Spencer L, Domingos P. Mining time-changing data streams[C]. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2001 : 97-106.
  • 4KlinKenberg R. Learning drifting concepts: examples selections vs. example weighting [ J ]. Intelligent Data Analysis, 2004, 8 (3) :281-300.
  • 5Wang H, Fan W, Yu P S, et al. Mining concept-drifting data streams using ensemble classifiers [ C ]. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2003 : 226-235.
  • 6Masud M M, Gao Jing, Han Jia-wei, et al. Classification and no- vel class detection in concept-drifting data streams under time con- straints[J]. IEEE Transactions on Knowledge and Data Engineer- ing, 2011, 23(6) :859-874.
  • 7Zhang Peng, Zhu Xing-quan, Tan Jian-long, et al. Classifier and cluster ensembles for mining concept drifting data streams [ C ]. In Data Mining (ICDM), 2010, IEEE 10th International Conference on Data Ming, IEEE, 2010: 1175-1180.
  • 8Aggarwal C C. A framework for diagnosing changes in evolving data streams[ C]. Proceeding of the 2003 ACM SIGMOD Interna- tional Conference on Management of Data, ACM, 2003 : 575-586.
  • 9Mozafari N, Hashemi S, Hamzeh A. On tracking behavior of streaming data: an unsupervised approach [ J ]. International Jour- nal of Data Engineering, 2011, 2 (1) :16-26.
  • 10柴玉梅,周驰,王黎明.数据流上概念漂移的检测和分类[J].小型微型计算机系统,2011,32(3):421-425. 被引量:9

二级参考文献15

  • 1赵辉,王黎明.一个基于网格服务的分布式关联规则挖掘算法[J].小型微型计算机系统,2006,27(8):1544-1548. 被引量:9
  • 2Domingos P, Hulten G. Mining high-speed data streams [C]. Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining, Boston, MA, 2000,71 -80.
  • 3Hulten G, Spencer L, Domingos P. Mining time-changing data streams [ C]. Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, SanFrancisco, CA, 2001,97-106.
  • 4Gama J Rocha R, Medas P. Accurate decision trees for mining high-speed data streams [ C]. Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining, Washington, D. C, 2003, 523-528.
  • 5Wang H, Fan W, Yu P S, et al. Mining concept-drifting data streams using ensemble classifiers [ C]. Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining, 2003, 226-235.
  • 6Aggarwal C C,Han J, Wang J, et al. On demand classification of data streams [C]. Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA , 2004, 503-508.
  • 7Gao Jing' Fan Wei, Han Jia-wei. On appropriate assumptions to mine data streams: analysis and practice [C]. Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, 2007, 143-152.
  • 8Zhang Peng, Zhu Xing-quan, Shi Yong. Categorizing and mining concept drifting data streams [ C]. Proceedings of the 14th International Conference on Knowledge Discovery and Data Mining, Las Vegas, 2008,812-820.
  • 9Huang J, Smola A, Gretton A,et al. Correcting sample selection bias by unlabeled data [C]. Advances in Neural Information Processing Systems, 2007.
  • 10Mahbod Tavallaee, Ebrahim Bagheri, Lu Wei,et al. A detailed analysis of the KDD CUP 99 data [ C]. Proceedings of IEEE Symposium: Computational Intelligence for Security and Defence Applications, 2009.

共引文献8

同被引文献37

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部