摘要
特征选择是处理高维数据的一项有效技术。针对传统方法的不足,结合F-score与互信息,提出了一种最小冗余最大分离的特征选择评价准则,该准则使所选择的特征具有更好的分类和预测能力;采用二进制布谷鸟搜索算法和二次规划两种搜索策略来搜索最优特征子集,并对两种搜索策略的准确性和计算量进行分析比较;最后,利用UCI数据集进行实验测试,实验结果说明了所提理论的有效性。
Feature selection is an effective technique for analyzing high-dimensional data.To improve the performance oftraditional feature selection methods,a novel criterion function named minimum redundancy and maximum separabilityfor feature selection is proposed by combining the F-score and mutual information.Based on the new criterion function,the features select own a better ability for classification and prediction.Binary cuckoo search algorithm and quadratic programmingalgorithm are adopted to search the optimal subset of features,the accuracy and the amount of computations forfeature selection of these two search strategies are analyzed.Finally,the effectiveness of the proposed principle is verifiedby the experimental results though conducting tests on UCI datasets.
作者
赖学方
贺兴时
LAI Xuefang;HE Xingshi(School of Science, Xi-an Polytechnic University, Xi-an 710048, China)
出处
《计算机工程与应用》
CSCD
北大核心
2017年第12期70-75,共6页
Computer Engineering and Applications
基金
陕西省软科学研究项目(No.2014KRM28-01)
陕西省教育厅专项科研计划项目(No.16JK1341)
西安市2015基础教育研究重大招标项目(No.2015ZB-ZY04)
西安工程大学研究生创新基金资助项目(No.CX201614)
关键词
高维数据
费希尔得分
搜索策略
特征选择
high-dimensional data
F-score
search strategy
feature selection