期刊文献+

一种基于密度的分布式聚类算法 被引量:10

An improved density based distributed clustering
下载PDF
导出
摘要 对基于密度的分布式聚类算法DBDC(density based distributed clustering)进行改进,提出了一种基于密度的分布式聚类算法DBDC*.该算法在局部筛选代表点时结合贝叶斯信息准则BIC,得到少量精准反映局部站点数据分布的BIC核心点,有效降低了分布式聚类过程中的数据通信量,全局聚类时综合考虑了各站点数据的分布情况.实验结果表明,算法DBDC*的效率优于DBDC,聚类效果好. A large number of data are distributed with the application of networks. Distributed clustering is a challenging research topic due to variety of the real-life constrains including bandwidth, the storage of the site memory, etc. An effective density-based distributed clustering algorithm (DBDC * ) is proposed to improve efficiency of the distributed clustering algorithm (DBDC). DBDC * , which is combined with the Bayesian Information Criterion, only selecting less BIC_ core_ points to represent each local site, effectively decrease network overload and improves the quality of global clustering. DBDC * is carried out on two different levels, i.e. the local level and the global level. On the local level, all sites carry out a DBSCAN clustering independently from each other. After having completed the clustering, a BIC core points local model is de/ermined. Next the local model is transferred to a central site, where the local models are merged in order to form a global model on the global level by analyzing the local BIC core points. To each local representatives a global cluster-identifier is assigned. This resulting global clustering is broadcasted to all local sites. Then all local models are updated. Experimental results show that the efficiency of the algorithm DBDC * is superior to that of the algorithm DBDC.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2008年第5期536-543,共8页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(40771163)
关键词 聚类 分布式聚类 基于密度的聚类算法(DBSCAN) 分布式聚类算法(DBDC) clustering, distributed clustering, density-based spatical cIustiny of application with noise(DBSCAN), density based distributed clusting(DBDC)
  • 相关文献

参考文献10

  • 1赵鹏,耿焕同,王清毅,蔡庆生.基于聚类和分类的个性化文章自动推荐系统的研究[J].南京大学学报(自然科学版),2006,42(5):512-518. 被引量:13
  • 2Januzaj E, Kriegel H P, Pfeifle M. DBDC Density based distributed clustering. Proceed ings of the 9^th International Conference of Extending Database Technology. Heraklion: Springer, 2004, 88-105.
  • 3Ester M, Kriegel H P, Sander J, etal. A den sity based algorithm of discovering clusters in large spatial databases with noise. Proceedings of the 2^nd International Conference ot Knowledge Discovery and Data Mining. Portland.. AAAI, 1996, 226-231.
  • 4Bezdek J C, Nikhil R P. Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernertics-Part B, 1998, 28 (3) : 301-310.
  • 5Kass R, Wasserman L. A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association, 1995, 773-795.
  • 6Dataset [DB/OL]. http://www. ics. uci. edu/-mlearn/databases/student/ . 1999-10-28.
  • 7Dataset [DB/OL]. http://www. ics. uci. edu/-mlearn/databases/iris/ . 1999- 10- 28.
  • 8Dataset [DB/OL]. http://www. ics. uci. edu/-mlearn/databases/glass/ . 1999- 10- 28.
  • 9The third international knowledge discovery and data mining tools competition dataset [DB/OL]. http://kdd. ics. uci. edu/databases/kddcup99/kddcup99. html. 1999 - 10- 28.
  • 10Modha D S, Spangler W S. Feature weighting in k-means clustering. Machine Learning, 2003, 52(3): 217-237.

二级参考文献10

  • 1邓爱林,左子叶,朱扬勇.基于项目聚类的协同过滤推荐算法[J].小型微型计算机系统,2004,25(9):1665-1670. 被引量:146
  • 2宋丽哲,牛振东,宋瀚涛,余正涛,师雪霖.数字图书馆个性化服务用户模型研究[J].北京理工大学学报,2005,25(1):58-62. 被引量:45
  • 3Bollacker K D, Lawrence S, Giles C L. Discovery relevance scientific literature on the web.IEEE Intelligence Systems, 2000,15(2) :42-77.
  • 4Mobasher B, Cooley R, Srivastava J. Automatic personalization based on web usage mining.Communications of the ACM,2000,43(8) : 142-151.
  • 5Albert R, Barabosi A L. Statistical mechanics of complex networks. Review of Modern Physics,2002,74(1):47-97.
  • 6Mooney R J, Roy L. Content-based book recommending using learning for text categorization.Proceedings of the 5th ACM Conference on Digital Libraries, 2000:195-204.
  • 7Rickard C, Martin S. Inverted file search algorithms for collaborative filtering. Proceedings ofthe 25th Annual International ACM SIGIR Conference, 2002: 246-252.
  • 8Lee D L, Chuang H, Seamons K E. Document ranking and the vector-space model. IEEE Software, 1997,14(2) : 67-75.
  • 9李振东,费翔林.基于概念的信息检索模型研究[J].南京大学学报(自然科学版),2002,38(1):99-109. 被引量:33
  • 10曾春,邢春晓,周立柱.个性化服务技术综述[J].软件学报,2002,13(10):1952-1961. 被引量:394

共引文献12

同被引文献143

引证文献10

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部