摘要
特征选择是重要的数据预处理技术,通过特征选择能够显著提升机器学习算法的性能。稳定性是特征选择研究的重要内容,它是指特征选择方法能够在微小扰动的训练数据上选出相同或相似的特征子集。通过提升特征选择方法的稳定性,能够发掘相关特征,提高领域专家对结果的置信度,降低数据存储开销,进一步提升学习算法的性能。将特征选择稳定性提升方法分为扰动法、特征法,扰动法包括数据扰动法、函数扰动法和混合法,特征法包括组特征法和特征信息法。本文给出了每种方法的含义,回顾了近年来的代表性方法,总结了每种方法的优缺点,指出了未来的研究方向。
Feature selection is an important data preprocessing technology,and it can significantly improve the performance of machine learning algorithms.Stability is an important research direction of feature selection.It refers to the feature selection method that can select the same or similar feature subsets on the slightly disturbed training data.By improving the stability of the feature selection method,relevant features can be discovered,the confidence of domain experts in the results can be improved,the data storage overhead can be reduced,and the performance of the learning algorithms can be further improved.The feature selection stability improvement methods are divided into perturbation method and feature method.The perturbation method includes data perturbation method,function perturbation method and hybrid method,and the feature method includes group feature method and feature information method.The meaning of each method is given,the corresponding representative methods in recent years are reviewed,the advantages and disadvantages of each method are summarized,and the future research direction is shown.
作者
王吉川
刘艺
WANG Ji-chuan;LIU Yi(Defense Innovation Institute,Beijing 100071)
出处
《数字技术与应用》
2021年第9期19-21,共3页
Digital Technology & Application
基金
科技部科技创新2030—重大项目(2020AAA0104800)。
关键词
特征选择
稳定性
高维数据
集成学习
机器学习
Feature selection
Stability
High-dimensional data
Ensemble learning
Machine learning