摘要
在无监督聚类特征选择过程中,局部欧氏度量可能置乱局部流形的拓扑结构,影响所选特征的聚类性能。为此,提出一种基于Grassmann流形的多聚类特征选择算法。利用局部主成分分析逼近数据点的切空间,获取局部数据的主要变化方向。根据切空间构造Grassmann流形,通过测地距保留局部数据的流形拓扑结构,以L1范数优化逼近流形拓扑,选择利于聚类的原本数据特征。实验结果验证了该算法的有效性。
In unsupervised feature selection for clustering, the local topology of spectral clustering is usually built by Euclidean distance, which can even scramble the local topology in the small local. The scrambling topology can degrade the performance of the clustering. In this paper, Grassmann Multi-cluster Feature Selection(MCFS) algorithm is proposed to solve the problem. The tangent space of the data is approximated by local principal component analysis, which represents the main variation direction of the local data and filters the influence of the scrambling points generated by Euclidean distance. Via constructing Grassmann manifold in the tangent space, the geodesic distance of Grassmann manifold can preserve the topology structure of the local data. The topology of the manifold is approximated by L1 norm optimization, and the feature subset of original features is selected. Experimental result proves the validity of this algorithm.
出处
《计算机工程》
CAS
CSCD
2012年第16期178-181,共4页
Computer Engineering
基金
国家自然科学基金资助项目(61073092)
国家国际科技合作专项基金资助项目(2011DFR10480)
陕西省教育厅自然科学专项基金资助项目(2010JK718)