摘要
由于传统的BIRCH算法是用直径来控制聚类的边界,因此如果簇不是球形,它就不能很好地工作,而且传统的BIRCH算法只适用于单表。针对BIRCH的这些缺点,本文提出了一种改进的BIRCH——IBIRCH算法,该算法首先通过ID传播把多个表联系起来,使得BIRCH算法可以适用于多表的情况,再通过计算共享最近邻密度,可以发现任意形状的簇。实验表明,该算法不仅具有较强的可伸缩性,还可以得到较高精确的聚类结果。
The traditional BIRCH clustering algorithm has many shortcomings, such as it is only fit for single table and only finds the global clusters. For these shortcomings, we introduce an improved algorithm—IBIRCH algorithm. First, this algorithm joins every table through the tuple ID propagation to be applied in relational databases. Then, find arbitrary clusters using the shared nearest neighbor density algorithm. The experiment shows the efficiency and scalability of this approach.
出处
《计算机科学》
CSCD
北大核心
2008年第3期180-182,208,共4页
Computer Science
基金
国家自然科学基金(60673136)
关键词
BIRCH算法
层次聚类
ID传播
SNN密度
BIRCH algorithm, Hierarchical clustering, Tuple ID propagation, SNN density