摘要
为了更好地消除特征间的冗余,结合稀疏学习,提出一种融合特征冗余度学习的稀疏无监督特征选择算法。首先,该算法利用L1范数度量投影数据点与聚类标签之间的损失,引入辅助变量将聚类标签的编码矩阵的正交性与非负性分离,确保编码矩阵是非负的且更接近理想的标签;其次,利用余弦相似度方法构造特征的冗余度矩阵,并将其作为正则项约束来学习投影矩阵;最后,通过L_(2,0)范数约束投影矩阵,可以恰好得到它的k个非零行,进而选出原始数据的k个特征。由此得到基于L_(2,0)范数约束和特征冗余度学习的稀疏无监督特征选择模型。所提算法在12个公开数据集上与10个相关算法进行比较,实验结果表明该算法在多数情况下可以选出更具判别性的特征。
In order to eliminate the redundancy between features efficiently,a sparse unsupervised feature selection algorithm,which integrated the feature redundancy learning and the sparse constraints,was proposed.Firstly,a sparse feature learning algorithm was presented,which used L1 norm to measure the loss between the projection data points and the clustering labels.Moreover,the auxiliary variable was introduced to separate the orthogonality and nonnegativity from the coding matrix of cluster labels matrix,so as to ensure that the coding matrix was nonnegative and was closer to the ideal label.Secondly,the cosine similarity was used to construct the redundancy matrix of features,and the projection matrix was studied as a regular term constraint for the reduction of dependence among features.Finally,by constraining the projection matrix with L_(2,0)norm,the k non-zero rows could be exactly obtained,and then the k features of the original data could be selected.Therefore,a sparse unsupervised feature selection model based on L_(2,0)norm constraint and feature redundancy learning could be obtained.A large number of comparative experiments were carried out on 10 related algorithms and 12 public datasets.The experimental results showed that the discriminative features could be selected by the proposed algorithm in most cases.
作者
蒙莹莹
李巧艳
杨小飞
袁林
MENG Yingying;LI Qiaoyan;YANG Xiaofei;YUAN Lin(School of Science,Xi′an Polytechnic University,Xi′an 710600,China)
出处
《郑州大学学报(理学版)》
CAS
北大核心
2023年第5期81-88,共8页
Journal of Zhengzhou University:Natural Science Edition
基金
国家自然科学基金项目(61976130)
陕西省重点研发计划项目(2018KW-021)
陕西省自然科学基金项目(2022KRM170)。
关键词
特征选择
稀疏学习
特征冗余
矩阵分解
无监督学习
feature selection
sparse learning
feature redundancy
matrix factorization
unsupervised learning