摘要
随着数据流挖掘的应用日趋广泛,带概念漂移的数据流分类问题已成为一项重要且充满挑战的工作.根据带概念漂移的数据流的特点,一个有效的学习器必须能跟踪并快速适应这种变化.一种基于增量KnnModel的动态层次编码算法被提出用于解决数据流的概念漂移问题.在将数据流划分为数据块后,根据增量KnnModel算法对每块的预学习结果构建并更新类别层次树、层次编码,用可增量学习的分类算法对照编码划分进行学习,并生成备选分类器集.最后依据活跃度对结点进行剪枝处理以减少计算代价.在预测阶段,利用增量KnnModel算法和动态层次纠错输出编码算法的各自优势进行联合预测.实验结果表明:基于增量KnnModel算法的动态层次纠错输出编码算法不但能够提高模型学习的动态性和分类的正确性,而且还能够快速适应概念漂移的情况.
With the extensive applications of data stream mining,the classification of concept-drifting data streams has become more and more important and challenging.Due to the characteristics of data streams with concept-drifting,an effective learner should be able to track such changes and to quickly adapt to them.A method named dynamic hierarchical ECOC algorithm based on incremental KnnModel(IKnnM-DHecoc) for handling the problem of concept drift is proposed.It divides a given data stream into several data blocks,and then learns from each data block by using incremental KnnModel algorithm.Based on the outcomes of pre-learning,a hierarchical tree together with a hierarchical coding matrix are built and updated,from which a chosen incremental learning method is used for training in order to build a set of classifier and a set of classifier candidates.Moreover,a pruning strategy for generated nodes of hierarchical tree is proposed to reduce computational cost by taking account of each node's activity.In testing phase,a combination scheme of taking advantage of both IKnnModel and DHecoc is used for prediction.Experimental results show that the proposed IKnnM-DHecoc algorithm not only improves the dynamic nature of learning and classification performance,but could quickly adapt to the situation of concept drift.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2011年第4期592-601,共10页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61070062)
教育部回国留学人员基金项目(教外司留[2008]890号)
关键词
概念漂移
数据流
纠错输出编码
增量Knn模型
分类
concept drift
data stream
error correcting output code
incremental KnnModel
classification