摘要
针对传统单标签特征选择算法不能直接应用于多标签数据的问题,提出一种多标签特征选择算法——MMLRF算法.在ReliefF的基础上,MML-RF算法提出新的类内最近邻样本查找方式,并结合多标签的贡献值改进特征权值的计算方法,能很好地适应多标签数据的特点;同时为了减少特征冗余,MML-RF算法以互信息作为特征冗余度量方式,提出一种去冗余方法,能够得到更小的特征子集.实验表明,MML-RF多标签特征选择算法得到的特征子集规模较小,且在多标签数据集上具有很好的分类效果,能够提升多标签学习和数据挖掘工作的效率.
In view of the problem that the traditional feature selection algorithm can not be applied to the multilabel learning context,a MML-RF algorithm is presented.The MML-RF improves the way of defining and searching nearest neighbor on the basis of the ReliefF,and introduces a new parameter to consider the contribution values of different labels.The improved weighting formula enables MML-RF to be used to the multi-label dataset.MML-RF algorithm makes use of mutual information as the measure of feature redundancy,and puts forward a solution to redundancy,which can get smaller subset of features.Experiments show that the feature subset of MML-RF is smaller,and has good classification effect on multi-label dataset,which can further enhance the efficiency of subsequent multi-label learning and data mining.
作者
陈平华
黄辉
麦淼
周宏虹
Chen Ping-hua;Huang Hui;Mai Miao;Zhou Hong-hong(School of Computers,Guangdong University of Technology,Guangzhou 510006,China;Guangdong Nanfang Media Group,Guangzhou 510601,China;Guangdong Science and Technology Innovation Monitoring and Research Center,Guangzhou 510033,China)
出处
《广东工业大学学报》
CAS
2018年第5期20-25,50,共7页
Journal of Guangdong University of Technology
基金
国家自然科学基金资助项目(61572144)
广东省科技计划项目(2013B091300009
2014B070706007
2017B030307002)
关键词
特征选择
多标签学习
RELIEFF
互信息
特征冗余
feature selection
multi-label learning
ReliefF
mutual information
feature redundancy