摘要
复杂数据流中所存在的概念漂移及不平衡问题降低了分类器的性能。传统的批量学习算法需要考虑内存以及运行时间等因素,在快速到达的海量数据流中性能并不突出,并且其中还包含着大量的漂移及类失衡现象,利用在线集成算法处理复杂数据流问题已经成为数据挖掘领域重要的研究课题。从集成策略的角度对bagging、boosting、stacking集成方法的在线版本进行了介绍与总结,并对比了不同模型之间的性能。首次对复杂数据流的在线集成分类算法进行了详细的总结与分析,从主动检测和被动自适应两个方面对概念漂移数据流检测与分类算法进行了介绍,从数据预处理和代价敏感两个方面介绍不平衡数据流,并分析了代表性算法的时空效率,之后对使用相同数据集的算法性能进行了对比。最后,针对复杂数据流在线集成分类研究领域的挑战提出了下一步研究方向。
The concept drift and imbalance problems in complex data streams reduce the performance of classifiers.Traditional batch learning algorithms need to consider factors such as memory and runtime,but their performance is not outstanding in ra-pidly arriving massive data streams,and they also contain a large number of drift and class imbalance phenomena.Utilizing online ensemble algorithms to handle complex data stream problems has become an important research topic in the field of data mining.Firstly,this paper introduced and summarized the online versions of bagging,boosting,and stacking ensemble methods from the perspective of ensemble strategies,and compared the performance of different models.Secondly,this paper conducted a detailed summary and analysis of online ensemble classification algorithms for complex data streams for the first time.This paper introduced conceptual drift data stream detection and classification algorithms from two aspects:active detection and passive adaptation.This paper introduced unbalanced data streams from two aspects:data preprocessing and cost sensitivity.This paper analyzed the spatiotemporal efficiency of representative algorithms,and then compared the performance of algorithms using the same dataset.Finally,this paper proposed the next research direction in response to the challenges in the field of online ensemble classification of complex data streams.
作者
李春鹏
韩萌
孟凡兴
何菲菲
张瑞华
Li Chunpeng;Han Meng;Meng Fanxing;He Feifei;Zhang Ruihua(School of Computer Science&Engineering,North Minzu University,Yinchuan 750021,China)
出处
《计算机应用研究》
CSCD
北大核心
2024年第3期641-651,共11页
Application Research of Computers
基金
国家自然科学基金资助项目(62062004)
宁夏自然科学基金资助项目(2022AAC03279)。
关键词
在线学习
集成学习
概念漂移
不平衡
online learning
ensemble learning
concept drift
imbalance