期刊文献+

基于分布式哈希表的分布式子空间聚类算法

Distributed Hash table-based distributed subspace clustering algorithm
下载PDF
导出
摘要 提出一种基于分布式哈希表(DHT)的分布式子空间聚类(DISCLUS)算法,该算法对各结点存储的数据分别进行子空间聚类,对聚类结果进行合并,得到分布式系统的聚类结果.针对子空间聚类的特点,提出结果集缩减和结果集剪枝策略对结点间通讯进行优化.为实现结点聚类结果合并,提出分布式表决算法(DDV).该算法利用底层覆盖网的拓扑结构进行层次化表决信息收集,在动态网络环境中实现了对所有结点的无冗余覆盖.理论分析和实验表明,DISCLUS算法的聚类误差和通讯性能能够较好地适应系统数据集规模、网络规模和数据空间维度的增加. A distributed subspace clustering (DISCLUS) algorithm based on distributed Hash table(DHT) was proposed.Each node executed subspace clustering on its local data.Then the clustering results of nodes were combined to form the final clustering results of the distributed system.The dataset reducing and pruning schemes were proposed to optimize the communication between nodes according to the speciality of subspace clustering.A DHT-based distributed voting (DDV) algorithm was proposed to combine the clustering results of nodes.The algorithm used the topology of the underlying overlay to hierarchically collect the voting information.All the nodes in the system can be covered without redundancy.The theoretical and experimental results show that the clustering error and the communication cost of DISCLUS algorithm are scalable to the dataset,nodes and dimensionality.
出处 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2010年第2期225-231,共7页 Journal of Zhejiang University:Engineering Science
基金 国家"863"高技术研究发展计划资助项目(2003AA1Z2130) 浙江省科技计划重大科技攻关资助项目(2005C11001-02)
关键词 对等网络 子空间聚类 分布式哈希表(DHT) 分布式表决 peer to peer subspace clustering distributed Hash table (DHT) distributed voting
  • 相关文献

参考文献14

  • 1AGRAWAL R, GEHRKE J, GUNOPULOS D, et al. Automatic subspace clustering of high dimensional data for data mining applications [C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, Washington: ACM, 1998:94-105.
  • 2PARSONS L, HAQUE E, LIU H, et al. Subspace clustering for high dimensional data: a review[J]. ACM SIGKDD Explorations Newsletter, 2004, 6 ( 1 ) : 90 - 105.
  • 3PATRIKAINEN A, MEILA M. Comparing subspace clusterings[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(7) : 902 - 916.
  • 4田敬,代亚非.P2P持久存储研究[J].软件学报,2007,18(6):1379-1399. 被引量:52
  • 5DATTA S, BHADURI K, GIANNELLA C, et al. Distributed data mining in peer-to-peer networks[J]. IEEE Internet Computing, 2006, 10(4) : 18 - 26.
  • 6BAWA M, GIONIS A, MOLINA H G, et al. The price of validity in dynamic networks[C]// Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. Paris, France: ACM, 2004:515- 526.
  • 7MEHYAR M, SPANOS D, PONGSAJAPAN J, et al. Asynchronous distributed averaging on communication networks[J]. IEEE/ACM Transactions on Networking, 2007, 15(3): 512-520.
  • 8KEMPE D, DOBRA A, GEHRKE J, et al. Computing aggregate information using Gossip[C]//Proceedings of the 44th IEEE Symposium on Foundations of Computer Science. Cambridge, MA: IEEE, 2003:482-491.
  • 9DATTA S, GIANNELLA C, KARGUPTA H, et al. K-means clustering over peer-to-peer networks[C]// Proceedings of the 8th International Workshop on High Performance and Distributed Mining. Newport Beach, CA: [s. n. ], 2005.
  • 10WOLFF R, BHADURI K, KARGUPTA H, et al. Local L2 thresholding based data mining in peer-to-peer systems [C]// Proceedings of the SIAM Conference on Data Mining. Bethesda, MD: [s. n.], 2006: 430-441.

二级参考文献56

  • 1Zhang Z,Lin S,Lian Q,Jin C.RepStore:A self-managing and self-tuning storage backend with smart bricks.In:Proc.of the Int'l Conf.on Autonomic Computing.2004.122-129.http://ieeexplore.ieee.org/xpl/freeabs_all.jsp-arnumber=1301355&fromcon
  • 2Stoica I,Morris R,Karger D,Kaashoek M,Balakrishnan H.Chord:A scalable peer-to-peer lookup service for internet applications.Proc.of the 2001 SIGCOMM Conf.,2001,31(4):149-160.
  • 3Zhao B,Kubiatowicz J,Joseph A.Tapestry:An infrastructure for fault-tolerant wide-area location and routing.Technical Report,UCB//CSD-01-1141,Berkeley Computer Science Division,University of California,2001.
  • 4Ratnasamy S,Francis P,Handley M,Karp R,Schenker S.A scalable content-addressable network.In:Proc.of the ACM SIGCOMM Symp.on Communication,Architecture,and Protocols.ACM SIGCOMM,2001.161-172.http://www.acm.org/sigs/ sigcomm/sigcomm/sigcomm2001/p13-ratnasamy.pdf
  • 5Rowstron A,Druschel P.Pastry:Scalable,distributed object location and routing for large-scale peer-to-peer systems.In:Proc.of the IFIP/ACM Int'l Conf.on Distributed Systems Platforms (Middleware).2001.329-350.http://citeseer.ist.psu.edu/ rowstron01pastry.html
  • 6Maymounkov P,Mazieres D.Kademlia:A peer-to-peer information system based on the XOR metric.In:Proc.of the 1st Int'l Workshop on Peer-to-Peer Systems.2002.258-263.http://citeseer.ist.psu.edu/maymounkov02kademlia.html
  • 7Schlosser M,Sintek M,Decker S,Nejdl W.HyperCuP-Hypercubes,ontologies and efficient search on P2P networks.In:Proc.of the Int'l Workshop on Agents and Peer-to-Peer Computing.2002.112-124.http://citeseer.ist.psu.edu/532386.html
  • 8Mitzenmacher M.Digital fountains:A survey and look forward.In:Proc.of the Information Theory Workshop.2004.271-276.http://ieeexplore.ieee.org/xpls/abs_all.jsp-arnumber=1405313
  • 9Plank J.A tutorial on reed-solomon coding for fault-tolerance in RAID-like systems.Software Practice and Experience,1997,27(9):995-1012.
  • 10Chun B,Dabek F,Haeberlen A,Sit E,Weatherspoon H,Kaashoek M,Kubiatowicz J,Morris R.Efficient replica maintenance for distributed storage systems.In:Proc.of the 3rd Symp.on Networked Systems Design and Implementation.2006.45-58.http://oceanstore.cs.berkeley.edu/publications/papers/pdf/carbonite06.pdf

共引文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部