摘要
特征选择是模式识别和机器学习中的重要环节之一,所选特征子集的质量直接影响着分类学习算法的效率及准确率。现有特征选择算法均在整个类标签集的视角下进行特征评价,并未分别考察每一类别与特征间的关系。提出了一种基于KL散度和类分离策略的特征选择算法,它采用类分离策略分别对类标签中每一类别与特征间的关系予以考察,并采用一种基于KL散度的有效距离度量类别与特征间的相关性以及特征之间的冗余性。实验结果表明,所提算法具有较高的运行效率;在所选特征质量上,所提算法显著优于经典的CFS、FCBF以及ReliefF特征选择算法。
Feature selection is one of the core issues in designing pattern recognition systems and has attracted conside-rable attention in the literature.Most of the feature selection methods in the literature only handle relevance and redundancy analysis from the point of view of the whole class,which neglect the relation of features and the separate class labels.To this end,a novel KL-divergence based feature selection algorithm was proposed to explicitly handle the relevance and redundancy analysis for each class label with a separate-class strategy.A KL-divergence based metric of effective distance was also introduced in the algorithm to conduct the relevance and redundancy analysis.Experimental results show that the proposed algorithm is efficient and outperforms the three representative algorithms CFS,FCBF and ReliefF with respect to the quality of the selected feature subset.
出处
《计算机科学》
CSCD
北大核心
2012年第12期224-227,共4页
Computer Science
基金
国家自然科学基金项目(60973085)资助
关键词
特征选择
KL散度
类分离策略
有效距离
Feature selection
KL-divergence
Separate-class strategy
Effective distance