摘要
作为一种典型的大数据,数据流具有连续、无限、概念漂移和快速到达等特点,因此传统的分类技术无法直接有效地应用于数据流挖掘。本文在经典的精度加权集成(Accuracy weighted ensemble,AWE)算法的基础上提出概念自适应快速决策树更新集成(Concept very fast decision tree update ensemble,CUE)算法。该算法不仅在基分类器的权重分配方面进行了改进,而且在解决数据块大小的敏感性问题以及增加基分类器之间的相异性方面,有明显的改善。实验表明在分类准确率上,CUE算法高于AWE算法。最后,提出聚类动态分类器选择(Dynamic classifier selection with clustering,DCSC)算法。该算法基于分类器动态选择的思想,没有繁琐的赋权值机制,所以时间效率较高。实验结果验证了DCSC算法的有效和高效性,并能有效地处理概念漂移。
As a typical big data,data stream has the features of continuous,infinite,concept drift and fast arrived.The features make it impossible to apply traditional classification techniques to classify data streams.The paper proposes the concept very fast decision tree(CVFDT)update ensemble(CUE)algorithm based on the classic accuracy weighted ensemble(AWE)algorithm.This algorithm not only improves the weight distribution of the base classifier,but also improves the sensitivity of the block size and the increase of the dissimilarity between base classifiers.Experiments show that,in the classification accuracy,CUE algorithm is higher than the AWE algorithm.Finally,the dynamic classifier selection with clustering(DCSC)algorithm is proposed,which is based on the idea of classifier dynamic selection.The time efficiency is relatively high because there is no tedious weight value mechanism.Experimental results show that the DCSC algorithm can effectively handle the concept of drift and its efficiency is relatively high.
作者
韩东红
马宪哲
李莉莉
王国仁
Han Donghong;Ma Xianzhe;Li Lili;Wang Guoren(School of Computer Science and Engineering,Northeastern University,Shenyang,110819,China)
出处
《数据采集与处理》
CSCD
北大核心
2018年第6期1021-1033,共13页
Journal of Data Acquisition and Processing
基金
国家自然科学基金(61173029
61272182
61672144
61332006)资助项目
关键词
数据流
基分类器
集成分类器
决策树
概念漂移
聚类
data streams
base classifier
ensemble classifier
decision tree
concept drift
clustering