摘要
处理复杂的多标记数据对于特征选择而言是一项挑战性任务.然而,现存的多标记特征选择方法存在三个问题未解决.首先,现有的多标记特征选择方法利用样例层流形正则化项保持样例的相似性结构或借助标签关联来指导特征选择,但两者对于特征选择的指导存在互补关系.其次,早期方法基于样例相似性所构造的近邻矩阵来探索标签关联,却忽略了成对标签本身的关联性.最后,早期方法整合多个未知变量,导致目标函数的求解变得困难.为解决上述问题,本文基于最小二乘回归模型构建经验损失函数,然后在目标函数中引入标签正则化项探索标签之间的关联,同时利用特征矩阵与重构稀疏系数矩阵的乘积表示预测标签并保留数据本身的局部几何结构.上述各项被整合在一个联合学习框架内.针对该学习框架,一套证明可收敛的优化方案被设计.在13个真实的多标记基准数据集上进行实验,实验结果验证了所提方法的有效性.
Dealing with complicated multi-label data is a challenging task for feature selection in practical applications.However,there exist three unsolved issues in the existing multi-label feature selection methods.First,previous multi-label feature selection methods either employ instance-level manifold regularization terms to maintain the instance similarity or exploit the correlations among labels to guide feature selection process,however,both two are complementary to each other in feature selection process.Second,existing methods explore label correlations based on the affinity matrix of instance similarity,ignoring the pairwise label correlations.Third,previous methods involve several unknown variables,which makes the solution of the objective function difficult.To tackle the issues mentioned above,an empirical loss function model is constructed based on the least square regression model.And then,we introduce the label regularization term to exploit label correlations,meanwhile employing the product of feature matrix and weight coefficient matrix to represent predicted labels so that the local geometric structure of data set is stored.Finally,we integrate the terms mentioned above into one joint learning framework.An effective optimization method with provable convergence is designed to solve our proposed method.In summary,the novelties and main contributions of this paper can be summarized as follows:the proposed method uses the instance-level manifold regularization term to maintain the instance similarity.At the same time,the proposed method introduces label-level manifold regularization term to exploit the label correlations.Moreover,the proposed method can store the geometric structure of labels in the weight coefficient matrix and employ the weight coefficient matrix to guide feature selection process,because the sparse coefficient matrix can maintain the geometric relationship between the data space and label space,as well as the relationship between labels,the proposed method can obtain superior classification ability on the test data set by using the sparse coefficient matrix that is learned by the training process.Furthermore,the proposed method introduces the L_(2,1)-norm that integrates the advantages of L_(1)-norm and L_(2)-norm to select important features in each iteration.Finally,the proposed method integrates all the above terms into one joint learning framework and develops a method to solve the constrained problem,i.e.,regulating regression coefficient matrix based on instance-similarity and label-similarity for multi-label feature selection that is named as RMLFS,while an optimal scheme is designed.In addition,we can obtain a globally optimal solution by this learning framework because the objective function only incorporates one unknown variable unlike other existing methods that incorporate multiple unknown variables that lead to the local optimal solution in most cases,and the objective function is a convex function.This method conducts multiple evaluation criteria on thirteen benchmark data sets to show the superiority of the proposed multi-label feature selection method.In order to verify the classification superiority of the proposed method,numerous experiments are conducted on thirteen different multi-label data sets.Eight competitive methods including MIFS,MDMR,SCLS,LRFS,mRMR,RALM-FS,TRCFS and GMM are compared to the proposed method.The extensive experimental results show that the classification performance of the proposed RMLFS outperforms other compared methods in these experiments.
作者
李永豪
胡亮
高万夫
LI Yong-Hao;HU Liang;GAO Wan-Fu(College of Computer Science and Technology,Jilin University,Changchun 130012;Key Laboratory of Symbolic Computation and Knowledge Engineering,Ministry of Education,Jilin University,Changchun 130012)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2022年第9期1827-1841,共15页
Chinese Journal of Computers
基金
国家重点研发专项(2017YFA0604500)
吉林省重点科技研发项目(20180201103GX)
吉林省科技厅联合基金项目(2020122209JC)资助.
关键词
特征选择
多标记学习
流形学习
稀疏化学习
分类
feature selection
multi-label learning
manifold learning
sparse learning
classification