摘要
针对多维数据集,为得到一个最优特征子集,提出一种基于特征聚类的封装式特征选择算法。在初始阶段,利用三支决策理论动态地将原始特征集划分为若干特征子空间,通过特征聚类算法对每个特征子空间内的特征进行聚类;从每个特征类簇里挑选代表特征,利用邻域互信息对剩余特征进行降序排序并依次迭代选择,使用封装器评估该特征是否应该被选择,可得到一个具有最低分类错误率的最优特征子集。在UCI数据集上的实验结果表明,相较于其它特征选择算法,该算法能有效地提高各数据集在libSVM、J48、Nave Bayes以及KNN分类器上的分类准确率。
To obtain an optimal feature subset of multi-dimensional data,a feature selection algorithm based on feature clustering and wrapper(FC_ W) was proposed.In the initial stage,the original feature set was divided into a number of feature subspaces using the three-way decision theory,and the features of each feature subspace were clustered using the feature clustering algorithm.The representative features were selected from each feature cluster,and the remaining features were sorted in descending order and iteratively selected using the neighborhood mutual information(NMI) between them.In this selection process,a wrapper was utilized to evaluate whether the selected feature should be selected or not.An optimal feature subset with a minimum classification error rate was obtained.Experimental evaluation on UCI data sets shows that,compared with the feature selection algorithms in other literatures,the proposed algorithm has higher classification accuracy in libSVM,J48,Nave Bayes and KNN classifiers.
出处
《计算机工程与设计》
北大核心
2018年第1期230-237,共8页
Computer Engineering and Design
基金
国家自然科学基金项目(61309014)
教育部人文社科规划基金项目(15XJA630003)
重庆市基础与前沿研究计划基金项目(cstc2013jcyj A40063)
重庆市教委科学技术研究基金项目(KJ1400412)
重庆市教委科学技术研究基金项目(KJ1500416)
关键词
特征选择
特征聚类
封装器
邻域互信息
三支决策
feature selection
feature clustering
wrapper
neighborhood mutual information(NMI)
three-way decision