摘要
针对数据挖掘中的特征选择问题,依据Hellinger距离的特性,研究了两种Hellinger距离的定义方式,提出了基于Hellinger距离的特征选择方法,设计了两种相应的算法。不同数据集上的实验结果表明了新算法选择的特征的有效性。与其他特征选择算法的对比可发现:这两种算法选择的特征个数少且对C4.5分类精度较好。
To solve the feature selection problem,two kinds of definitions of Hellinger distance were studied in this paper,and the corresponding feature selection algorithms based on Hellinger distance were also proposed.The experiments on different data sets show the efficiency of the two algorithms.Compared with other feature selection algorithms,the feature selection algorithms based on Hellinger distance can get fewer features,which are useful for C4.5 and can improve the average accuracy of the classification in the learned data sets.
出处
《计算机应用》
CSCD
北大核心
2010年第6期1530-1532,1634,共4页
journal of Computer Applications
基金
江苏省自然科学基金资助项目(BK2009233)