摘要
数据的特征空间常随时间动态变化,而训练样本的数量固定不变,数据的特征空间在呈现超高维特点的同时通常伴随决策空间的类别不平衡问题.对此,文中提出基于最大决策边界的高维类不平衡数据在线流特征选择算法.借助邻域粗糙集模型,在充分考虑边界样本影响的基础上,定义自适应邻域关系,设计基于最大决策边界的粗糙依赖度计算公式.同时,提出三种在线特征子集评估指标,用于选择在大类和小类之间具有强区分能力的特征.在11个高维类不平衡数据集上的实验表明,在相同的实验环境及特征数量下,文中算法综合性能较优.
The feature space of data changes with time dynamically.The number of features on training data is high-dimensional and fixed,and the label space is imbalanced.Motivated by the above,an online streaming feature selection algorithm for high-dimensional and class-imbalanced data based on max-decision boundary is proposed.An adaptive neighborhood relation is defined with consideration of the effect of boundary samples based on neighborhood rough set,and then a rough dependency calculation formula with respect to max-decision boundary is designed.Meanwhile,three online feature subset evaluation metrics are proposed to select features with great discriminability in majority and minority classes.Experiments on eleven high-dimensional and class-imbalanced datasets indicate that the proposed method achieves better performance than some state-of-the-art online streaming feature selection algorithms.
作者
林耀进
陈祥焰
白盛兴
王晨曦
LIN Yaojin;CHEN Xiangyan;BAI Shengxing;WANG Chenxi(School of Computer Science and Engineering,Minnan Normal University,Zhangzhou 363000;Key Laboratory of Data Science and Intelligence Application,The Education Department of Fujian Province,Minnan Normal University,Zhangzhou 363000)
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2020年第9期820-829,共10页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.61672272)
福建省自然科学基金项目(No.2018J01548,2018J01547)
福建省教育厅科技项目(No.JAT180318)资助。
关键词
在线特征选择
高维类不平衡数据
自适应邻域
邻域粗糙集
Online Feature Selection
High-Dimensional and Class-Imbalanced Data
Adaptive Neighborhood
Neighborhood Rough Set