期刊文献+

基于自适应邻域嵌入的无监督特征选择算法 被引量:9

Adaptive Neighborhood Embedding Based Unsupervised Feature Selection
下载PDF
导出
摘要 无监督特征选择算法可以对高维无标记数据进行有效的降维,从而减少数据处理的时间和空间复杂度,避免算法模型出现过拟合现象.然而,现有的无监督特征选择方法大都运用k近邻法捕捉数据样本的局部几何结构,忽略了数据分布不均的问题.为了解决这个问题,提出了一种基于自适应邻域嵌入的无监督特征选择(adaptive neighborhood embedding based unsupervised feature selection, ANEFS)算法,该算法根据数据集自身的分布特点确定每个样本的近邻数,进而构造样本相似矩阵,同时引入从高维空间映射到低维空间的中间矩阵,利用拉普拉斯乘子法优化目标函数进行求解.6个UCI数据集的实验结果表明:所提出的算法能够选出具有更高聚类精度和互信息的特征子集. Unsupervised feature selection algorithms can effectively reduce the dimensionality of high-dimensional unmarked data, which not only reduce the time and space complexity of data processing, but also avoid the over-fitting phenomenon of the feature selection model. However, most of the existing unsupervised feature selection algorithms use k-nearest neighbor method to capture the local geometric structure of data samples, ignoring the problem of uneven data distribution. To solve this problem, an unsupervised feature selection algorithm based on adaptive neighborhood embedding(ANEFS) is proposed. The algorithm determines the number of neighbors of samples according to the distribution of datasets, and then constructs similarity matrix. Meanwhile, a mid-matrix is introduced which maps from high-dimensional space to low-dimensional space, and Laplacian multiplier method is used to optimize the reconstructed function. The experimental results of six UCI datasets show that the proposed algorithm can select representative feature subsets which have higher clustering accuracy and normalize mutual information.
作者 刘艳芳 李文斌 高阳 Liu Yanfang;Li Wenbin;Gao Yang(State Key Laboratory for Novel Software Technology(Nanjing University),Nanjing 210023;College of Mathematics and Information Engineering,Longyan University,Longyan,Fujian 364012)
出处 《计算机研究与发展》 EI CSCD 北大核心 2020年第8期1639-1649,共11页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2017YFB0702600,2017YFB0702601) 国家自然科学基金项目(61806096) 福建省中青年教师教育科研项目(科技类)(JAT170577,JAT190743) 龙岩市科技计划项目(2019LYF13002)。
关键词 K近邻 自适应邻域 流形学习 特征选择 无监督学习 k-nearest neighbor adaptive neighborhood manifold learning feature selection unsupervised learning
  • 相关文献

参考文献7

二级参考文献43

  • 1张莉,孙钢,郭军.基于K-均值聚类的无监督的特征选择方法[J].计算机应用研究,2005,22(3):23-24. 被引量:29
  • 2刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J].计算机研究与发展,2005,42(3):381-386. 被引量:37
  • 3李颖新,李建更,阮晓钢.肿瘤基因表达谱分类特征基因选取问题及分析方法研究[J].计算机学报,2006,29(2):324-330. 被引量:45
  • 4Langley P. Selection of relevant features in machine learning [C] //Proc of the AAAI Fall Symposium on Relevance. Menlo Park, CA: AAAI, 1994:1-5.
  • 5Dash M, Liu H. Feature selection for classification [J]. International Journal of Intelligent Data Analysis, 1997, 1 (3): 131-156.
  • 6Pudil P, Novovicova J. Novel methods for subset selection with respect to problem knowledge[J]. IEEE Intelligent Systems, 1998, 13(2): 66-74.
  • 7Robnik-Sikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF [J]. Machine Learning, 2003, 53(1): 23-69.
  • 8Hall M. Correlation-based feature selection for discrete and numeric class machine learning [C]//Proc of the 7th Int Conf on Machine Learning. San Francisco: Morgan Kaufmann, 2000:359-366.
  • 9Mitra P, Murthy C A, Pal S K. Unsupervised feature selection using feature similarity [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2002, 24(3) : 301-312.
  • 10Wei H L, Billings S A. Feature subset selection and ranking for data dimensionality reduction [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2007, 29(1): 162-166.

共引文献147

同被引文献56

引证文献9

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部