摘要
机器学习领域中的特征选择算法可简化模型输入,提高可解释性并帮助避免维度灾难及过拟合现象的发生.针对基于封装法进行特征选择时,评价模型通常将搜索出的特征子集直接作为输入,导致算法对特征利用和评估效果受限于评价模型的特征学习能力,限制了对更适特征子集的发现能力等问题,提出一种基于级联森林结构的子集特征预学习封装法.该方法在搜索算法与评价模型之间添加多层级联森林,重构待评价特征子集为高级特征集,降低评价模型模式识别难度,提高对子集性能的评价效果.实验对比了多种搜索算法及评价模型组合,本方法可在保证分类性能的前提下,进一步降低所选特征数量,同时维持了封装法的低耦合性.
Feature selection algorithms in the machine learning domain can simplify the input of model,improve interpretability,and avoid dimensional catastrophe and over-fitting.In terms of selecting features on wrapper methods,the evaluation of models usually take the feature subsets filtered by the search algorithm as input directly,which leads to the fact that feature exploitation and evaluation of models is restricted by the feature reconstruction and fitting ability of the evaluation model.Moreover,the more appropriate feature subsets were limited to be discovered either.To solve the problems,a pre-learning wrapper method was proposed based on the cascade forest structure.Adding multi-level cascade forest between the search algorithm and the evaluation model,the model was arranged to transform the feature subset as high-level feature set,reducing the difficulty of recognition in the evaluation and improving the performance of feature subset.In contrast experiment,a variety of search algorithms and evaluation model combinations were evaluated on multiple datasets.The results indicate that the proposed method can reduce the number of selected features,while maintaining classification performance and the low coupling property of wrapper methods.
作者
潘丽敏
佟彤
罗森林
秦枭喃
PAN Limin;TONG Tong;LUO Senlin;QIN Xiaonan(Information System & Security and Countermeasures Experiments Center, Beijing Institute of Technology, Beijing 100081,China)
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2021年第11期1201-1206,共6页
Transactions of Beijing Institute of Technology
基金
国家“十三五”科技支撑计划项目(SQ2018YFC200004)
国家卫生部卫生行业科研专项基金项目(201302008)。
关键词
特征选择
封装法
级联森林
特征学习
feature selection
wrapper method
cascade forest
feature learning