摘要
针对高维数据集中冗余特征或无关特征降低机器学习模型分类准确率的问题,提出了一种基于近似马尔科夫毯的特征选择(nmRMR)算法。该算法首先利用最大相关最小冗余的准则进行特征相关性排序;采用近似马尔科夫毯算法对冗余特征或者无关特征进行删除,并最大程度地提高特征之间的相关性从而获得最优特征子集。在UCI的8个公开数据集上对比的实验结果表明:与mRMR算法相比,本文算法所选择出的特征子集数平均减少了6.875个,平均分类准确率提高了0.78%;与FullSet算法相比,本文算法所选择出的特征子集数平均减少了20.56个,平均分类准确率提高了1.88%;与FCBF算法相比,本文算法所选择出的特征子集数平均减少了3.187 5个,平均分类准确率提高了0.825%;本文算法总体优于其他算法。
To solve the problem that redundancy or irrelevant features in high-dimensional datasets reduce the classification accuracy of machine learning model,a feature selection algorithm based on approximate Markov blanket is proposed and named as normal max-relevance and min-redundancy(nmRMR)algorithm.Firstly,the algorithm uses the criteria of maximum relevance and minimum redundancy to perform feature relevance ranking.Then,it adopts the approximate Markov blanket to remove redundant features or irrelevant features,and maximize the correlation between features to obtain the optimal feature subset.Experimental results on UCI’s eight open datasets show that:the proposed nmRMR algorithm achieves on average 6.875,20.56 and 3.187 5 reduction in the selected number of feature subsets,as well as 0.78%,1.88%and 0.825%improvement in the average classification accuracy,compared with the mRMR algorithm,the FullSet algorithm,and the FCBF algorithm,respectively.It is concluded that the proposed nmRMR algorithm is superior to other algorithms.
作者
张俐
王枞
郭文明
ZHANG Li;WANG Cong;GUO Wenming(Key Laboratory of Trustworthy Distributed Computing and Service Ministry of Education, Beijing University of Posts and Telecommunications,Beijing 100876,China)
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2018年第10期141-145,共5页
Journal of Xi'an Jiaotong University
基金
国家科技基础性工作专项资助项目(2015FY111700-6)
关键词
特征选择
特征相关
冗余特征
近似马尔科夫毯
feature selection
feature relevance
redundancy feature
approximate Markov blanket