摘要
特征选择是一种重要的数据预处理步骤,其中互信息是一类重要的信息度量方法.本文针对互信息不能很好地处理数值型的特征,介绍了邻域信息熵与邻域互信息.其次,设计了基于邻域互信息的最大相关性最小冗余度的特征排序算法.最后,用此算法选择前若干特征进行分类并与其它算法比较分类精度.实验结果表明本文提出算法在分类精度方面且优于或相当于其它流行特征选择算法.
Feature selection is an important data preprocessing technique, where mutual information has been widely studied in information measure. However, mutual information cannot directly calculate relevancy among numeric features. In this paper, we first introduce neighborhood entropy and neighborhood mutual information. Then, we propose neighborhood mutual information based max relevance and min redundancy feature selection. Finally, experimental results show that the proposed method can effectively select a discriminative feature subset, and outperform or equal to other popular feature selection algorithms in classification performance.
出处
《漳州师范学院学报(自然科学版)》
2013年第4期13-18,共6页
Journal of ZhangZhou Teachers College(Natural Science)
基金
福建省自然科学基金项目(2013J01259)
漳州市科技计划项目(ZZ2013J04)
关键词
特征选择
邻域互信息
最大相关性
最小冗余度
feature selection
neighborhood mutual information
max relevance
min redundancy