摘要
针对无标签高维数据的大量出现,对机器学习中无监督特征选择算法进行了研究。提出了一种结合自表示相似矩阵和流形学习的无监督特征选择算法。首先,通过数据的自表示性质,构建相似矩阵,结合低维流形能够表示高维数据结构这一流形学习思想,建立一种考虑流形学习的无监督特征选择优化模型。其次,为了保证选择到更有用及更稀疏的特征,采用l2,1范数对优化模型进行约束,使特征之间相互竞争,消除冗余。进而,通过变量交替迭代对优化模型进行求解,并证明了算法的收敛性。最后,通过与其他几个无监督特征算法在四个数据集上的对比实验,证明了所提算法的有效性。
In view of the large number of unlabeled high-dimensional data,this paper studied unsupervised feature selection in machine learning,and proposed an unsupervised feature selection algorithm combining self-representation similarity matrix and manifold learning. Firstly,it constructed the similarity matrix by the self-expressiveness property of the data. Based on the idea that low-dimensional manifold could represent high-dimensional data structures,it established an unsupervised feature selection optimization model considering manifold learning. Secondly,in order to ensure the selection of more useful and sparse features,it used the l2,1 norm to constrain the optimization model,so that features competed with each other and eliminated redundancy. Furthermore,it solved the optimization model by alternating iteration of variables,and proved convergence of the algorithm. Finally,it demonstrated the effectiveness of the proposed algorithm by comparison with other unsupervised feature algorithms on four data sets.
作者
周婉莹
马盈仓
郑毅
杨小飞
Zhou Wanying;Ma Yingcang;Zheng Yi;Yang Xiaofei(School of Science,Xi’an Polytechnic University,Xi’an 710048,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第9期2634-2639,共6页
Application Research of Computers
基金
国家自然科学基金资助项目(11501435)
陕西省教育厅科研计划资助项目(18JS042)
西安工程大学研究生创新基金资助项目(chx2019057)。
关键词
无监督学习
特征选择
稀疏回归
特征流形学习
unsupervised learning
feature selection
sparse regression
feature manifold learning