摘要
针对目前子空间聚类算法存在精度差、效率低的问题,设计了一种子空间聚类算法DSUB.提出了裁剪候选对象的方法,减少了候选聚类对象的个数且对候选对象分组,使得待搜索的聚类簇只能是某个组的子集,可降低后续聚类处理的复杂度.此外,提出了新的邻域查询方法和抽样覆盖策略用以提高密度聚类的处理速度.实验结果表明:DSUB算法精度高,能够发现任意形状的聚类簇;计算复杂度与数据量呈线性关系;抗噪声性能强;聚类结果与处理顺序无关.DSUB算法非常适合处理子空间聚类.
DSUB subspace clustering algorithm was proposed in this paper because the existing algorithms suffer from low accuracy and efficiency. A candidate pruning method was introduced to reduce the number of candidates for clustering and divide them into groups, so that clusters for search can only locate in one group, which reduced the computational complexity of later clustering processing. New neighborhood inquiry method and sampling coverage method were introduced to speed up density clustering processing. Test results show that DSUB algorithm is high in accuracy and effective in discovering clusters of arbitrary shape. The computational complexity is linear with data number. The algorithm is robust against noise and the clustering results are not affected by the order of processing. DSUB is a satisfactory subspace clustering algorithm.
出处
《天津大学学报》
EI
CAS
CSCD
北大核心
2010年第7期623-628,共6页
Journal of Tianjin University(Science and Technology)
基金
天津市高等学校科技发展基金资助项目(20080810)
中国博士后科学基金资助项目(20090450767)
关键词
高维度数据
子空间
聚类
数据挖掘
high-dimensional data
subspace
clustering
data mining