摘要
针对基于关联性特征选择算法(CFS)对于回归任务只能识别变量线性关系和分类任务使用对称不确定性度量的不足,提出一种基于最大信息系数(MIC)的CFS特征选择算法:MICCFS.将回归任务中衡量变量间的线性相关系数和分类任务中的对称不确定性度量改进为MIC度量方式.运用最佳优先搜索算法搜索特征子集,以UCI机器学习数据库中11个回归数据集和10个分类数据集为实验对象,采用支持向量机、k近邻算法、朴素贝叶斯模型、决策树四种分类器,比较了MICCFS和CFS以及常用特征选择方法SVMRFE、Lasso、MIM、Relief F、Chi-Square的效果,结果表明MICCFS具有一定优势.
To solve the problem that the correlation-based feature selection algorithm(CFS)can only recognize the linear relationship of variables for regression tasks and symmetrical uncertainty for classification tasks,a CFS feature selection algorithm based on maximum information coefficient(MIC)(named as MICCFS)is presented.It can replace the linear correlation coefficient between variables and symmetrical uncertainty in the classification task with MIC measure.The feature subset is searched by the best-first search algorithm.We conduct experiments to compare the results of MICCFS,CFS and other commonly used feature selection methods SVMRFE,Lasso,MIM,ReliefF,Chi-Square on eleven real-world datasets for regression and ten datasets for classification from UCI machine learning repository with using support vector machine(SVM),k-nearest neighbor algorithm(k-NN),naive bayes model(NB)and decision tree classifier(DT).The results show that MICCFS is superior to others.
作者
罗幼喜
谢昆明
胡超竹
李翰芳
LUO Youxi;XIE Kunming;HU Chaozhu;LI Hanfang(School of Science,Hubei University of Technology,Wuhan 430068,China)
出处
《华中师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2023年第6期777-785,共9页
Journal of Central China Normal University:Natural Sciences
基金
国家自然科学基金青年项目(11701161)
教育部人文社会科学基金项目(17YJA790098)
湖北省教育厅人文社会科学重点项目(20D043)
湖北工业大学博士启动基金项目(BSQD2020103)。
关键词
关联性特征选择
最大信息系数
特征选择
分类
降维
correlation-based feature selection
maximum information coefficient
feature selection
classification
dimension reduction