摘要
现实数据往往分布在高维空间中,从整个向量空间来看,这些数据间的联系非常分散,因此如何降低维数实现高维数据的聚类受到众多研究者的普遍关注.介绍了一种适用于符号型高维数据的随机投影聚类算法.其根据频率选择与聚类相关的维向量,随机产生并根据投影聚类效果择优选择聚类中心及相关维向量,将投影聚类算法扩展至符号数据空间.实验结果证实了这种算法的实用性与有效性.
Most of data always exist in high dimensions. From the whole space, the distribution of these data is so separate that it is difficult to find good clusters. Therefore, more and more concerns are placed on how to cluster high-dimensional data. This paper presents a Random Projected Clustering algorithm (RanPC) for categorical data. After selecting related vectors using frequency, the algorithm produces the centers of cluster randomly and chooses good centers according to the clustering effect. This approach expands projected cluster algorithm from numerical space to categorical space. Experiment shows its practicability and effectivity.
出处
《小型微型计算机系统》
CSCD
北大核心
2006年第9期1605-1607,共3页
Journal of Chinese Computer Systems
基金
国家"九七三"重大基础研究基金项目(G1999032805)资助
关键词
数据挖掘
符号属性
随机投影聚类
高维数据
data mining
categorical attributes
Random Projected Clustering
high-dimensional data