快速聚类算法在个性化服务中的应用

Fast Cluster Algorithm Applied in Individuation Information Service

下载PDF

导出

摘要许多实际应用已经证明,k-means算法能够有效地得到好的聚类结果。但是,k-means直接算法的时间复杂度和模式复杂度对数据量的大小非常敏感,无法满足一些高性能的应用场合,如个性化服务中对用户数据进行的群组分析。对此,笔者提出了一种新颖的基于k-d树的聚类算法。这种算法采用空间数据结构—k-d树组织所有的样本数据,可以高效地搜索到离某个给定的聚类中心最近的全部模式。实验结果表明,该方案可以显著提高k-means直接算法的运算速度,在距离运算量和总的运算时间上,可把性能提高1～2个数量级。 The k-means method has been shown to be effective in producing good clustering results with many practical applications.However,the time required in a direct algorithm of k-means method is sensitive to the number of patterns.To this problem,this paper presents a new algorithm to perform k-means clustering:k-d tree based cluster al-gorithm.This experimental results demonstrate that the scheme can improve the computational speed of the direct k-means algorithm by an order to two orders of magnitude in the total number of distance calculations and the overall time of computation.

作者张剑李卫钟义信郭燕慧

机构地区北京邮电大学信息工程学院

出处《计算机工程与应用》 CSCD 北大核心 2004年第12期10-11,219,共3页 Computer Engineering and Applications

基金国家863高技术研究发展计划项目资助(编号:2002AA117010-07)

关键词聚类k-平均误差函数k-d树个性化服务 Cluster,K-means,Error function,K-d tree,Individuation information service

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献6

1[1]M Ester,H Kriegel,X Xu. Knowledge Discovery in Large Spatial Databases:Focusing Techniques for Efficient Class Identification[C].In:Proc of the Fourth Int′l Symposium on Large Spatial Databases,1995:102～112
2[2]D Judd,P McKinley,A Jain. Large-Scale Parallel Data Clustering[C].In:Proc Int′l Conference on Pattern Recognition,1996-08
3[3]L Kaufman,P J Rousseeuw. Finding Groups in Data:an Introduction to Cluster Analysis[M].John Wiley & Sons, 1990
4[4]R T Ng,J Hah. Efficient and Effective Clustering Methods for Spatial Data Mining[C].In:Proc of the 20th Int′l Conf on Very Large Databases,Santiago, Chile, 1994:144～155
5[5]E Schikuta. Grid Clustering:An Efficient Hierarchical Clustering Method for Very Large Data Sets[C].In:Proc 13th Int′l Conference on Pattern Recognition, 1996
6[6]T Zhang,R Ramakrishnan,M Livny. BIRCH:An Efficient Data Clustering Method for Very Large Databases[C].In:Proc of the 1996 ACM SIGMOD Int′l Conf on Management of Data,Montreal,Canada,1996:103～114

计算机工程与应用

2004年第12期

浏览历史

内容加载中请稍等...

快速聚类算法在个性化服务中的应用

参考文献6

相关作者

相关机构

相关主题

浏览历史