摘要
为解决连续值特征条件互信息计算困难和对多值特征偏倚的问题,提出了一种基于Parzen窗条件互信息计算的特征选择方法。该方法通过Parzen窗估计出连续值特征的概率密度函数,进而方便准确地计算出条件互信息;同时在评价准则中引入特征离散度作为惩罚因子,克服了条件互信息计算对于多值特征的偏倚,实现了对连续型数据的特征选择。实验证明,该方法能够达到与现有方法相当甚至更好的效果,是一种有效的特征选择方法。
In order to solve the problems of calculating the conditional mutual information of continuous variables and bias of multi-value features,this paper proposed a novel feature selection method. The method was based on computing conditional mutual information with Parzen window called PCMIFS,which adopted Parzen window to estimate the probability density function and compute conditional mutual information of continuous feature. And introduced a penalty factor,feature dispersion,to overcome the bias of multi-value features. The experiment results show that comparing several existing method,PCMIFS can attain better or comparable performance,and is an effective feature selection method.
出处
《计算机应用研究》
CSCD
北大核心
2015年第5期1387-1389,1398,共4页
Application Research of Computers
关键词
特征选择
PARZEN窗
条件互信息
特征离散度
feature selection
Parzen window
conditional mutual information
feature dispersion