摘要
现有的在线流特征选择算法通常选择一个最优的全局特征子集,并假设该子集适用于样本空间的所有区域.但是,样本空间的每个区域都使用独有的特征子集进行准确描述,这些特征子集的特征和大小可能有所不同.因此,文中提出基于最大决策边界的局部在线流特征选择算法.引入局部特征选择,在充分利用局部信息的基础上,设计基于最大决策边界的特征衡量标准,尽可能分开同类样本和不同类样本.同时,使用最大化平均决策边界、最大化决策边界和最小化冗余3种策略选择合适的特征.针对局部区域选择最优的特征子集,然后使用类相似度测量方法进行分类.在14个数据集上的实验结果和统计假设检验验证文中算法的分类有效性和稳定性.
The existing online streaming feature selection algorithms usually select the optimal global feature subset,and it is assumed that this subset adapts to all regions of the sample space.However,each region of the sample space is characterized accurately by its own distinct feature subsets.The feature subsets are likely to be different in feature and size.Therefore,an algorithm of local online streaming feature selection based on max-decision boundary is proposed.The local feature selection is introduced.With the full usage of local information,feature measurement standards based on max-decision boundary are designed to separate samples of the same class from samples of different classes as far as possible.Meanwhile,three strategies,maximizing average decision boundary,maximizing decision boundary and minimizing redundancy,are employed to select appropriate features.The class similarity measurement method is applied after the optimal feature subset is selected for the local regions.Experimental results and statistical hypothesis tests on fourteen datasets demonstrate the effectiveness and stability of the proposed algorithm.
作者
孙世明
邓安生
SUN Shiming;DENG Ansheng(School of Information Science and Technology,Dalian Maritime University,Dalian 116026)
出处
《模式识别与人工智能》
CSCD
北大核心
2021年第12期1131-1142,共12页
Pattern Recognition and Artificial Intelligence
关键词
特征选择
流特征
局部特征选择
最大决策边界
Feature Selection
Streaming Feature
Local Feature Selection
Max-Decision Boundary