期刊文献+

类不平衡的特征演化流在线学习方法

Online Learning Method for Class Imbalanced and Feature Evolvable Streams
下载PDF
导出
摘要 特征演化流是指特征空间以任意形式动态变化的数据流,其中同时存在数据类别分布不平衡的现象,这给数据流分类任务带来巨大挑战。在线学习是数据流挖掘的有效工具之一,但目前鲜有在线学习框架可同时处理数据流中特征演化和类不平衡问题。因此,提出一种类不平衡的特征演化流在线学习方法。首先,对实例特征进行划分,并将分类器分别投影至对应特征空间,结合在线被动-主动算法分别训练不同特征空间下的分类器;然后,将代价敏感指标最小化问题融入模型在线优化目标函数中,根据不平衡率定义新的代价敏感因子,动态调整类别权重以解决类不平衡问题;最后,为提高分类器泛化性能,利用变异系数筛选出重要特征,从而对分类器稀疏截断处理。大量仿真实验结果表明,该方法在11个UCI数据集上均获得较高的准确率、几何均值和马修斯相关系数,分别平均提升约0.021、0.058和0.072,验证了所提方法对特征演化流具有良好的自适应能力,同时能有效处理特征演化流中的类不平衡问题。 Feature evolvable streams are data streams in which all forms of feature spaces change dynamically,and an imbalanced class distribution may exist simultaneously.These problems may pose significant challenges to data stream classification.Online learning is an effective tool for mining data streams;however,few frameworks can handle both feature evolution and class-imbalance problems.Therefore,this study proposes an online learning method for class imbalanced and feature evolvable streams.First,by dividing the feature space of an instance,the classifiers are projected onto the corresponding feature spaces.Different classifiers are trained by combining the online passive-active algorithm.Subsequently,the cost-sensitive index minimization problem is integrated into the online optimization objective function of the model.By defining a new cost-sensitive factor according to the imbalance rate,the class weight is dynamically adjusted to solve the class imbalance problem.Finally,the important features are screened using the coefficient of variation,and an improved projection and truncation strategy is conducted to sparsify the classification model.The experimental results show that the proposed method achieved high accuracy,Geometric mean(G-mean),and Matthews Correlation Coefficient(MCC)values on 11 UCI datasets,with average improvements of approximately 0.021,0.058,and 0.072,respectively.This verifies that the proposed method has good adaptive ability to feature evolvable streams and can effectively deal with the class imbalanced problem in this type of data stream.
作者 陈燕菲 刘三民 CHEN Yanfei;LIU Sanmin(School of Computer and Information,Anhui Polytechnic University,Wuhu 241000,Anhui,China)
出处 《计算机工程》 CAS CSCD 北大核心 2024年第9期92-103,共12页 Computer Engineering
基金 安徽省自然科学基金(2308085MF220) 安徽省高校自然科学研究重点项目(2022AH050972,KJ2021A0516)。
关键词 数据流挖掘 特征演化 类不平衡 在线学习 代价敏感学习 data streams mining feature evolution class imbalance online learning cost sensitive learning
  • 相关文献

参考文献4

二级参考文献28

共引文献68

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部