期刊文献+

非平衡数据流在线主动学习方法

Online Active Learning Method for Imbalanced Data Stream
下载PDF
导出
摘要 数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream,OALM-IDS).AdaBoost是一种将多个弱分类器经过迭代生成强分类器的集成分类方法,AdaBoost.M2引入了弱分类器的置信度,此类方法常用于静态数据.定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量,从而使AdaBoost.M2方法适用于非平衡数据流,提升了非平衡数据流集成分类器的性能.提出了边际阈值矩阵的自适应调整方法,优化了标签请求策略.将概念漂移程度融入模型构建过程中,定义了基于概念漂移指数的自适应遗忘因子,实现了漂移后的模型重构.在6个人工数据流和4个真实数据流上的对比实验表明,提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法. Data stream classification is an important research task in the field of data stream mining,which aims to capture changing class structures from the ever-changing massive data.At present,almost no frameworks can simultaneously address the common problems in data stream,such as multi-class imbalance,concept drift,outlier and the exorbitant costs associated with labeling the unlabeled samples.In this paper,we propose an online active learning method for imbalanced data stream(OALM-IDS).AdaBoost is an ensemble classification method that iteratively generates a strong classifier from multiple weak classifiers.AdaBoost.M2 further introduces the confidence degree of weak classifiers,which is suitable for static data.In the method,we firstly define an importance measure of training sample based on imbalanced ratio and adaptive forgetting factor,which makes the AdaBoost.M2 method applying for imbalanced data stream and improves the performance of ensemble classifier.Then,we propose an adaptive adjustment method of marginal threshold matrix,which optimizes the label request strategy.Finally,we define an adaptive forgetting factor based on the concept drift index by bringing the degree of concept drift into the construction process of model,which realizes the model reconstruction after drift.Comparative experiments on six artificial data streams and four real data streams show that the classification performance of the online active learning method is better than those of the existing five learning methods for imbalance data stream.
作者 李艳红 任霖 王素格 李德玉 LI Yan-Hong;REN Lin;WANG Su-Ge;LI De-Yu(School of Computer and Information Technology,Shanxi University,Taiyuan 030006;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006)
出处 《自动化学报》 EI CAS CSCD 北大核心 2024年第7期1389-1401,共13页 Acta Automatica Sinica
基金 国家自然科学基金(62076158,62072294,41871286) 山西省重点研发计划(201903D421041)资助。
关键词 主动学习 数据流分类 多类非平衡 概念漂移 Active learning data stream classification multi-class imbalance concept drift
  • 相关文献

参考文献3

二级参考文献26

共引文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部