摘要
为了更好地预处理未标记数据,大多数基于图正则的无监督特征选择算法通过构造样本的相似性矩阵来删除冗余信息并选择具有代表性的特征子集。这些方法中的大多数图都是用固定数量的近邻数来初始化,忽略了数据分布不均匀的问题。为了解决这个问题,提出了一种基于自适应邻域和自表示正则的无监督特征选择算法(Adaptive neighborhood regularized self-representation,ANRSR)来选择具有代表性和判别性的特征子集。为了保留局部内在结构,该算法将基于自适应邻域的流形正则化运用到自表示模型中,并利用了一种迭代方法来解决此优化问题。最后,选取4种经典的无监督特征选择算法,在几个基准数据集上进行了对比实验,验证所提算法能够选出具有更高聚类精度和互信息的判别性特征子集。
To better pre-process unlabeled data,most existing graph-based unsupervised feature selection algorithms remove redundant information and select representative feature subsets by constructing the similarity matrix of samples.However,most of the graphs in these methods are initialized with a fixed number of neighbors,ignoring the problem of uneven data distribution.Aiming to tackle this defect,an unsupervised feature selection based on adaptive neighborhood regularized self-representation(ANRSR)is proposed to select the representative and discriminative feature subsets.To preserve the local intrinsic structure,this paper incorporates manifold regularization based on adaptive neighborhood into the self-representation model and uses an iterative method to solve the optimization problem.Comparative experiments on several benchmark datasets among four classic algorithms and the proposed algorithm are conducted to validate that the proposed algorithm can select discriminative feature subsets which have higher clustering accuracy and mutual information.
作者
彭明
张继炎
王慧玲
黄宏昆
刘艳芳
Peng Ming;Zhang Jiyan;Wang Huiling;Huang Hongkun;Liu Yanfang(College of Mathematics and Information Engineering,Longyan University,Longyan 364012,China;Department of Electronics and Information Engineering,Yili Normal University,Yining 835000,China)
出处
《南京理工大学学报》
CAS
CSCD
北大核心
2021年第4期439-446,共8页
Journal of Nanjing University of Science and Technology
基金
福建省中青年教师教育科研项目(科技类)(JAT190743)
龙岩市科技计划项目(2019LYF13002)。
关键词
自适应邻域
自表示
流形学习
特征选择
无监督学习
adaptive neighborhood
self-representation
manifold learning
feature selection
unsupervised learning