摘要
近年来,数据流分类问题研究受到了普遍关注,而漂移检测是其中一个重要的研究问题。已有的分类模型有单一集成模型和混合模型,其漂移检测机制多基于理想的分布假设。单一模型集成可能导致分类误差扩大,噪音环境下分类效果受到了一定影响,而混合集成模型多存在分类精度和时间性能难以两者兼顾的问题。为此,基于简单的WE集成框架,构建了基于决策树和bayes混合模型的集成分类方法 WE-DTB,并利用典型的概念漂移检测机制Hoeffding Bounds和μ检验来进行数据流环境下概念漂移的检测和分类。大量实验表明,WE-DTB能够有效检测概念漂移且具有较好的分类精度及时空性能。
Mining with data stream concept drift is a hot topic in data mining.Existing classification approaches consist of ensemble method based on single base classifiers and ensemble method based on hybrid base classifiers,which depend on the stationary assumption and learnable assumption.However,the former probably causes the larger classification deviation and the performance on accuracy is impacted in the noisy data streams,while the latter performs worse on the classification accuracy or the time consumption.Motivated by this,an ensembling classification method WE-DTB was proposed,based on hybrid based models with decision trees and Naive Bayes.It is an extended framework of WE model.Meanwhile,we utilized the popular concept drift detection mechanisms based on Hoeffding Bounds and μ test to implement the detection on concept drifts.Extensive experiments demonstrate that our proposed method WE-DTB can detect concept drift effectively while maintaining the good performance on classification accuracy and consumptions on time and space.
出处
《计算机科学》
CSCD
北大核心
2012年第1期152-155,181,共5页
Computer Science
基金
国家自然科学基金课题(60975034)
安徽省自然科学基金课题(090412044)资助
关键词
数据流
概念漂移
分类
噪音
Data streams
Concept drifts
Classification
Noise