摘要
特征选择是机器学习和模式识别领域的关键问题之一。随着模式识别与数据挖掘的深入,研究对象越来越复杂,对象的特征维数也越来越高,此时特征选择的稳定性也显得尤为重要。分析了1-范数支持向量机,用该方法对高维数据进行特征选择,并对特征选择的结果进行集成;提出了一种针对高维数据的稳定性度量方法;在基因表达数据上的实验结果表明,集成特征选择可以有效提高算法的稳定性。
Feature selection is one of the key issues in the field of machine learning and pattern recognition. With pattern recognition and data mining becoming increasingly deeper, the target of research becoming more and more complex and the dimension of feature becoming higher and higher, the stability of feature selection is particularly important. Based on the sparse SVM (support vector machine) model, this paper analyzes L1SVM (1-norm support vector machine), applies this method to feature selection on high-dimensional data and integrates the results of feature selection according to ensemble learning principle of feature selection. Moreover, the paper designs a new stability measure for high-dimensional data. The experimental results on the gene expression data demonstrate that ensemble feature selection is able to effectively improve the stability of feature selection.
出处
《计算机科学与探索》
CSCD
2012年第10期948-953,共6页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金 No.61003116
江苏省自然科学基金重点项目 No.BK2011005
江苏省自然科学基金 Nos.BK2011782
BK2010263~~
关键词
特征选择
高维数据
稳定性
1-范数支持向量机
集成
feature selection
high-dimensional data
stability
1-norm support vector machine (LSVM)
ensemble