摘要
相对于静态数据集,数据流具有海量、实时、动态、有序等特点,正因如此,数据流分类往往会有更复杂的情况出现。在此背景下,若仍采用传统单一分类器对数据流进行分类,会受单模型的学习能力和适应的限制,导致分类性能下降,对于无限的数据流,其性能可能会极差。针对此问题,基于集成学习的思想构建集成分类器模型对数据流进行分类。所设计的集成模型包含决策树、KNN、SOM、SVM等算法,使用朴素贝叶斯减小噪音的影响。在人工数据集和真实数据集上的实验结果显示,集成模型较单模型有较好的分类性能。
Compared to the static data set, data stream has the characteristics of massive, real-time, dynamic and orderly, which is why the data flow classification tends to be more complicated. In this background, if still adopts the traditional single classifier to classify the data stream, it will be limited to learn and adapt by single model's ability, and resulting in decline in classification performance especially for unlimited data stream, its performance may be extremely poor. Aiming at this problem, construct an ensemble model for classification of data streams based on the idea of ensemble learning. The ensemble model designed includes decision tree, KNN, SOM and SVM, etc., and finally uses naive Bayes to reduce the noise impact. The experimental results on artificial data sets and real data sets show that the ensemble model has better classification performance than single model.
出处
《现代计算机》
2018年第3期21-25,共5页
Modern Computer
关键词
数据流
分类
集成学习
单模型
Data Stream
Classification
Ensemble Learning
Single-Model