期刊文献+

基于少量类标签的概念漂移检测算法 被引量:7

Concept drift detection method with limited amount of labeled data
下载PDF
导出
摘要 传统的概念漂移数据流分类算法通常利用测试数据的真实类标来检测数据流是否发生概念漂移,并根据需要调整分类模型。然而,真实类标的标记需要耗费大量的人力、物力,而持续不断到来的高速数据流使得这种解决方案在现实中难以实现。针对上述问题,提出一种基于少量类标签的概念漂移检测算法。它根据快速KNNModel算法利用模型簇分类的特点,在未知分类数据类标的情况下,根据当前数据块不被任一模型簇覆盖的实例数目较之前数据块在一定的显著水平下是否发生显著增大,来判断是否发生概念漂移。在概念漂移发生的情况下,让领域专家针对那些少量的不被模型簇覆盖的数据进行标记,并利用这些数据自我修正模型,较好地解决了概念漂移的检测和模型自我更新问题。实验结果表明,该方法能够在自适应处理数据流概念漂移的前提下对数据流进行快速的分类,并得到和传统数据流分类算法近似或更高的分类精度。 Most existing algorithms for data streams mining utilize the true label of testing data to detect concept drift and adjust current model according to requirements. It is impractical in real-world applications as manual labeling of instances which arrive continuously at a high speed requires a lot of human and material resources. Therefore, a concept drift detection method with limited amount of labeled data was proposed. The proposed method used the model clusters generated by the fast KNNModel algorithm to classify instances. It was able to detect concept drift on whether the number of instances which were not covered by any model clusters on the current block increased remarkably at a certain significance level than that of the prior block. Once concept drift happened, the domain experts were asked to label a few instances which were not covered by the model clusters and these representative instances were used to update the current model. The experimental results show that, compared with the traditional classification algorithms, the proposed method not only adapts to the situation of concept drift, but also acquires approximate or better classification accuracy.
出处 《计算机应用》 CSCD 北大核心 2012年第8期2176-2181,2185,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(61070062 61175123) 福建高校产学合作科技重大项目(2010H6007)
关键词 概念漂移 数据流 分类 KNNModel 模型簇 concept drift data stream classification KNNModel model cluster
  • 相关文献

参考文献31

  • 1MASUD M M, GAO J, KHAN L, et al. Mining concept-drifting data stream to detect peer to peer botnet traffic[EB/OL].[2012-01-04]. http://www.utdallas.edu/~mmm058000/reports/UTDCS-05-08.pdf.
  • 2CRUPI V, GUGLIEMINO E, MILAZZO G. Neural-network-based system for novel fault detection in rotating machinery[J].Journal of Vibration and Control, 2004, 10(8): 1137-1150.
  • 3DELANY S J, CUNNINGHAM P, TSYMBAL A. A comparison of ensemble and case-base maintenance techniques for handing concept drift in spam filtering[C] // FLAIRS'2006: Proceedings of 19th International Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2006: 340-345.
  • 4MASUD M M, GAO J, KHAN L, et al. A practical approach to classify evolving data streams: Training with limited amount of labeled data[C] // ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining. Washington, DC: IEEE Computer Society, 2008:929-934.
  • 5WIDMER G,KUBAT M.Learning in the presence of concept drift and hidden contexts[J] .Machine Learning,1996,23(1):69-101.
  • 6HO S-S, WECHSLER H. A martingale framework for detecting changes in data streams by testing exchangeability[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(12):2113-2127.
  • 7HULTEN G, SPENCER L, DOMINGOS P. Mining time-changing data streams[C] // KDD '01: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2001: 97-106.
  • 8DIETTERICH T G, BARKIRI G. Solving multiclass learning problems via error-correcting output codes[J].Artificial Intelligence Research, 1995, 2(1): 263-286.
  • 9郭躬德,黄杰,陈黎飞.基于KNN模型的增量学习算法[J].模式识别与人工智能,2010,23(5):701-707. 被引量:26
  • 10辛轶,郭躬德,陈黎飞,毕亚新.IKnnM-DHecoc:一种解决概念漂移问题的方法[J].计算机研究与发展,2011,48(4):592-601. 被引量:13

二级参考文献137

共引文献75

同被引文献58

  • 1肖辉,胡运发.基于分段时间弯曲距离的时间序列挖掘[J].计算机研究与发展,2005,42(1):72-78. 被引量:59
  • 2孙玉芬,卢炎生.流数据挖掘综述[J].计算机科学,2007,34(1):1-5. 被引量:36
  • 3田新广,高立志,孙春来,张尔扬.基于系统调用和齐次Markov链模型的程序行为异常检测[J].计算机研究与发展,2007,44(9):1538-1544. 被引量:19
  • 4FOLINO G,PIZZUTI C,SPEZZANO G. Mining distributed evolving data streams using fractal GP ensembles[A].{H}Berlin:Springer-Verlag,2007.160-169.
  • 5GABER M M,YU P S. Classification of changes in evolving data streams using online clustering result deviation[EB/OL].http://citeseerx.ist.psu.edu/viewdoc/download?doi =10.1.1.89.6882&rep =repl &type =pdf,2012.
  • 6KATAKIS I,TSOUMAKAS G,VLAHAVAS I. Tracking recurring contexts using ensemble classifiers:an application to email filtering[J].{H}Knowledge and Information Systems,2010,(3):371-391.
  • 7KUNCHEVA L. Change detection in streaming multivariate data using likelihood detectors[J].{H}IEEE Transactions on Knowledge and Data Engineering,2011,(5):1175-1180.
  • 8BAENA-GARCIA M,del CAMPO-AVILA J,FIDALGO R. Early drift detection method[A].{H}Berlin:Springer-Verlag,2006.77-86.
  • 9ALIPPI C,ROVERI M. Just-in-time adaptive classifiers,Part Ⅰ:detecting nonstationary changes[J].{H}IEEE Transactions on Neural Networks,2008,(7):1145-1153.
  • 10ALIPPI C,BORACCHI G,ROVERI M. An effective just-in-time adaptive classifier for gradual concept drifts[A].Piscataway:IEEE Press,2011.1675-1682.

引证文献7

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部