摘要
采用一种数据组织方式,提出一种特征向量聚类方法.首先选取特征空间中一些容易聚类的高密度数据点作为初始种子集合,并对其进行聚类.然后从剩下的数据点中选取种子集合的所有k近邻数据点,通过半监督判别式分析方法将当前种子集合及其k近邻数据投影到一个新的投影空间中,在该空间中对这些数据点再进行聚类,得到新的聚类结果,并将这些k近邻数据添加到当前种子集合中.通过迭代上述步骤,当种子集合的k近邻数据为空集时,算法结束.实验表明,该聚类方法优于经典的K-means、均值漂移、谱聚类等算法.
A method of clustering in feature space is proposed in this paper via a kind of organization of data points. Firstly, those feature data points with higher densities which are relatively easy to be clustered are picked out as the initial seed data set. Then, the κ-nearest neighbors of data in seed set are selected from the remained data points in feature space, and the data points in seed set and their κ-nearest neighbors are transformed into a new space. In this space those data points are re-clustered, and the k-nearest neighbors are merged into current seed set. The above steps are iterated, and the clustering method will not terminate until there are no κ-nearest points of the seed set to be found. Experimental results show that the clustering method performs better than the traditional clustering methods such as κ-means, mean shift and spectral clustering.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2010年第3期320-326,共7页
Pattern Recognition and Artificial Intelligence
基金
国家863计划项目(No.2007AA01Z166)
国家自然科学基金项目(No.60805006)资助
关键词
特征向量
聚类
半监督判别式分析
均值漂移
Feature Vector, Clustering, Semi-Supervised Discriminant Analysis, Mean Shift