摘要
数据流分为静态数据流和动态数据流,但因数据的情况越来越复杂,动态数据流已经遍布我们的生活.针对动态数据流中的不平衡数据流、概念漂移数据流及噪声数据流的基本概念、算法特点、相关工作及优缺点等方面进行了分析和阐述,同时对三类动态数据流的传输特点、适用方法及集成分类算法展开了介绍与对比,并对突变、增量、重复及渐变的概念漂移类型展开了研究,以及集成分类中常用的Boosting和Bagging方法进行了深度研究,指出了现阶段动态数据流集成分类算法所需要解决的主要问题,此外,针对多种类概念漂移、复合动态数据流及集成基分类器的动态加权等提供了多个可扩展性研究方向,并进行了分析和展望.
This paper analyzes and expounds the basic concepts,algorithm characteristics,related work,advantages and disadvantages of imbalance data streams,concept drift data streams and noise data streams in dynamic data streams.The transmission characteristics,applicable methods and ensemble classification algorithms of three kinds of dynamic data streams are introduced and compared.The types of concept drift of sudden,increment,recurrent and gradual change are studied.And the boosting and bagging methods commonly used in ensemble classification are deeply studied.The research directions of multi-category concept drift,composite dynamic data streams and dynamic weighting of ensemble based classifier are analyzed and prospected,etc.
作者
刘允峰
佟季萱
叶应图
LIU Yunfeng;TONG Jixuan;YE Yingtu(College of Information Science and Technology,Bohai University,Jinzhou 121013,China;College of Information Science and Engineering,China University of Petroleum,Beijing 102249,China)
出处
《渤海大学学报(自然科学版)》
CAS
2023年第1期79-91,共13页
Journal of Bohai University:Natural Science Edition
基金
国家自然科学基金项目(No:62172057)
辽宁省普通高等教育本科教学改革研究项目(No:202110167817).
关键词
不平衡
概念漂移
噪声
集成分类
综述
imbalance
concept drift
noise
ensemble classification
overview