期刊文献+

基于稀疏聚类的无监督特征选择 被引量:2

Unsupervised feature selection via sparse representation clustering
下载PDF
导出
摘要 特征选择是从特征集合中选择相关特征子集,方便数据聚类、分类和检索等.现有的无监督特征选择算法是将高维数据映射到低维空间并计算每个特征的得分,选择排名靠前的特征.提出一种基于稀疏聚类的无监督特征选择算法:首先利用流形学习的特征映射思想将高维空间的数据映射到低维空间中,用样本构造近邻图,通过图的嵌入找到低维空间,降维后的空间能保持原始数据集的流形结构.其次,得到的样本嵌入矩阵表示特征的重要性,是区分特征对每一个聚类簇的贡献大小的指标,利用低维空间对高维空间的拟合,构造一个目标函数.最后,目标函数本质是回归问题,求解回归优化问题常用最小角回归算法,使用L_1范数进行稀疏回归计算每个特征的得分,选出得分靠前的特征.在六个现实数据集上的实验结果表明:该算法在聚类精度和互信息上取得了较好的实验结果,能有效地选出重要特征,在降维方面具有良好性能,优于其他对比算法. Feature selection is designed to select the relevant feature subset f rom the original features,which can facilitate data clustering,classification and retrieval.The most existing unsupervised feature selection algorithms establish a mathematical model by casting high-dimensional data into low-dimensional space.The scores for each feature are computed independently to select the top-ranked features.In this paper,we propose an unsupervised feature selection via sparse representation clustering.Firstly,the data of high-dimensional space is mapped into low-dimensional space by the Laplacian eigenmaps of manifold learning.Specifically,we construct a nearest neighbor graph with the number of samples.The low-dimensional space is found by embedding the graph,and the manifold structure of the original dataset is maintained.Secondly,we obtain the “flat” embedded matrix,which measures the importance of each feature and differentiates the contribution of each feature for each cluster.We can construct an objective function based on the low-dimensional space to fit high-dimensional space.The Least Angel Regression algorithm can be used to solve the optimization regression problem.We perform L1-norm sparse regression to accurately estimate the importance of features instead of evaluating the contribution of each feature,respectively.We can achieve the top-ranked features according to their finals-cores.Experimental results on six real-life datasets show that the proposed algorithm achieves good experimental results in clustering and mutual information.It can effectively select the important features and has good performance in the dimension reduction.In addition,the proposed algorithm is superior to several typical feature selection algorithms in the experimental process.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2018年第1期107-115,共9页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(61703196)
关键词 无监督特征选择 流形学习 特征映射 稀疏回归 unsupervised feature selection manifold learning Laplacian eigenmaps sparse regression
  • 相关文献

同被引文献18

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部