期刊文献+

不确定数据的高效聚类算法 被引量:2

More Effcient Clustering Algorithm Over Uncertain Data
下载PDF
导出
摘要 不确定数据聚类是数据挖掘领域中的一个重要的研究热点。本文介绍了不确定数据聚类的uk-means算法及其改进算法ck-means。由于ck-means算法必须计算每个簇到所有对象的质心的距离,因此当聚类的样本很大时,聚类效率依然不是很好。本文提出的kd-means算法只需要计算对象到部分质心的距离,因此可以很大程度地提高ck-means算法的效率。该方法是基于kd树索引而提出的改进策略,并用大量的实验来证明改进算法的有效性。 Clustering of uncertain data is an important research direction in the clustering research field. It has far-reaching applications in real life. An improved clustering algorithm kd-means is proposed by optimizing classical ck-means algorithm. The ck-means algorithm needs to calculate the distance of each cluster to the centroid of all objects,so when the sample is large,the clustering efficiency is not very good. The improved algorithm based on the kd-tree structure presented in the paper only needs to calcu- late part of the distances,which greatly improves the performance of the ck-means algorithm. Experiments demonstrate that the new algorithm is efficient.
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2011年第2期161-166,共6页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家自然科学基金资助项目(61063008) 云南省教育厅研究基金资助项目(09Y0048) 云南大学科学研究基金资助项目(2009F29Q)
关键词 KD树 ck—means算法 期望中心 候选集 剪枝 kd-tree ck-means algorithm expected centroid candidate set pruning
  • 相关文献

参考文献10

  • 1HAN Jia-wei,KAMBER M. DataMining:coneepts and techniques[M]. San Francisco :Morgan Kaufmann Publishers, 2000.
  • 2CHAU M,CHENG R,KAO B. Uncertain data mining:a new research direction[C]//Proceeding Workshop on the Sciences of the Artificial. Washington DC : IEEE Computer Society, 2005 : 199-204.
  • 3NGAI W K,KAO B,CHUI C K ,et al. Efficient clustering of uncertain datal[C]//Proeeeding of the 6th IEEE Interna- tional Conference on Data Mining (ICDM 2006). Washington DC:IEEE Computer Society,2006..436-445.
  • 4KRIEGEL H P,PFEIFLE M. Hierarchical density-based clustering of uncertain data [C]//Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005). Washington DC: IEEE Computer Society, 2005: 689- 692.
  • 5NG R T,HAN Jia-wei. Efficient and effective clustering methods for spatial data mining[C]//Proceedings of the VLDB Conference. Santiago .. Morgan Kaufmann, 1994 : 144-155.
  • 6GUHA S,RASTOGI R,SHIM K. CURE :an efficient clustering algorithm for large databases [J]. Information Sys- tems, 2001,26 (1) : 35-58.
  • 7ELKAN C. Using the triangle inequality to accelerate k-means [C]//Proceeding of the International Conference on Machine Learning 2003 (ICML 2003). Washington DC :IEEE Press, 2003 : 609-616.
  • 8CHENG R,KALASHNIKOV D,PRABHAKAR S. Querying imprecise data in moving object environments[J]. IEEE Transactions on Knowledge and Data Engineering, 2004,16 (9) : 1112 - 1127.
  • 9MICHAEl. C, REYNOLD C, BEN K,et al. Uncertain data mining : an example in clustering location data [C]//Pro- ceeding of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Berlin: Springer Verlag, 2006 : 199-204.
  • 10LEE S D,KAO B,CHENG R. Reducing uk-means to K-means[C]//The 1st Workshop on Data Mining of Uncertain Data (DUNE) ,in conjunction with ICDM. Trenton ,NJ :IEEE Press, 2007 : 483-488.

同被引文献5

引证文献2

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部