摘要
隐含概念漂移的数据流分类问题是数据挖掘领域研究的热点之一,而实际数据流中的噪音会影响数据流的分类质量,为此,提出一种面向噪音和概念漂移数据流的集成分类算法.该算法使用支持向量机作为基分类器,采用贝叶斯分类器过滤噪音,利用Hoeffding Bounds不等式确定的双阈值检测概念漂移,并动态地更新分类模型以适应数据流环境的变化.实验结果表明,本文提出的算法可以有效地跟踪检测含噪数据流中的概念漂移,并且具有较好的分类精度.
The classification problem of concept drifting data streams is a hot topic in the data mining,and noise in real data streams will affect classification quality of data streams,therefore,an ensemble classification algorithm for data streams with noise and concept drifts is proposed in the paper. The algorithm uses support vector machine as the basic classifier,and the Bayesian classifier is adopted to filter noise data,also use dual thresholds determined by Hoeffding bounds inequality to detect concept drifts,and dynamically updates the classification model to adapt to the changes in data streams. Experimental results showthat the proposed algorithm can effectively track and detect concept drifts in noisy data streams,and has better classification accuracy.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第7期1445-1449,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(51174257
F030504)资助
中央高校基本科研业务费专项资金项目(2013BHZX0040)资助
安徽省教育厅自然科学重点项目(KJ2016A549)资助
阜阳师范学院自然科学项目(2016FSKJ17)资助
关键词
数据流
噪音
概念漂移
分类
集成模型
data streams
noise
concept drifts
classification
ensemble model