摘要
为了克服Relief选择前k个特征作为约简子集所存在的原始特征空间中的近邻在约简后的特征子空间中不一定还是近邻的问题,提出了一种在特征子空间中评价候选特征子集类别区分能力的方法,并结合最好优先特征搜索策略提出了一种新的特征子集选取方法.在12个UCI(加州大学欧文分校)数据集和1个老年痴呆实测数据集上,就约减能力对所提方法与其他3种经典特征选择方法进行了比较,并用决策树、逻辑回归模型详细比较了分类效果.实验结果表明:所提方法不仅能够选出特征数目较少的特征子集,而且特征子集的分类效果良好.
The nearest neighbor of a sample in original feature space does not necessarily hold in the reduced feature space. To overcome above problem of Relief that select top k features as a reduced fea- ture subset, a feature subset evaluation method was proposed to assess the discrimination capability of a feature subset in reduced feature subset space. Combining the evaluation method with the best-first search strategy, a new feature subset selection method was developed. This method was compared with three classical feature selection approaches by using decision tree and logistic regression algo- rithms on 12 UCI (University of California Irvine) data sets and Alzheimer's disease truth data. Experimental results demonstrate that not only this method is able to select a feature subset with smaller number of features for most data sets, but also the performance of classification is excellent in most cases.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2011年第2期1-5,共5页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家高技术研究发展计划资助项目(2006AA02Z347)
科技部国际科技合作项目(2009DFA12290)
关键词
特征选择
特征子集
特征评价
分类
老年痴呆症
feature selection
feature subset
feature evaluation
classification
Alzheimer's disease