期刊文献+

局部显著单元高维聚类算法 被引量:1

High Dimensional Clustering Algorithm Based on Local Significant Units
下载PDF
导出
摘要 以等宽或随机宽度网格密度单元为基础的高维聚类算法不能保证复杂数据集中的聚类结果的质量。该文在核密度估计和空间统计理论的基础上,给出一种基于局部显著单元的高维聚类算法来处理复杂数据的高维聚类问题。该方法以局部核密度估计和空间统计理论为基础定义了局部显著单元结构来捕获局部数据分布;设计了能快速发现覆盖数据分布的局部显著区域的贪婪算法;对具有相同属性子集的局部显著单元执行Single-linkage算法发现其中的聚类结果。实验结果表明,以局部显著单元为基础的高维聚类算法能够发现复杂数据集中隐含的高质量聚类结果。 High dimensional clustering algorithm based on equal or random width density grid cannot guarantee high quality clustering results in complicated data sets.In this paper,a High dimensional Clustering algorithm based on Local Significant Unit(HC_LSU) is proposed to deal with this problem,based on the kernel estimation and spatial statistical theory.Firstly,a structure,namely Local Significant Unit(LSU) is introduced by local kernel density estimation and spatial statistical test;secondly,a greedy algorithm named Greedy Algorithm for LSU(GA_LSU) is proposed to quickly find out the local significant units in the data set;and eventually,the single-linkage algorithm is run on the local significant units with the same attribute subset to generate the clustering results.Experimental results on 4 synthetic and 6 real world data sets showed that the proposed high-dimensional clustering algorithm,HC_LSU,could effectively find out high quality clustering results from the highly complicated data sets.
出处 《电子与信息学报》 EI CSCD 北大核心 2010年第11期2707-2712,共6页 Journal of Electronics & Information Technology
基金 国家自然科学重点基金(90715037) 国家973计划项目(2007CB714205) 澳大利亚ARC项目(DP0770479) 安徽省教育厅重点项目(KJ2009A54 KJ2010A325)资助课题
关键词 聚类分析 高维聚类算法 核密度估计 局部显著单元 Clustering analysis High dimensional Clustering(HC) algorithm Kernel density estimation Local Significant Unit(LSU)
  • 相关文献

参考文献16

  • 1孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1074
  • 2Hinneburg A and Keim D A. An efficient approach to clustering in large multimedia databases with noise [C]. Processing of the 4th International Conference on Knowledge Discovery and Data Mining, New York: AAAI Press, 1998:58-68.
  • 3Hinneburg A and Gabriel H H. DENCLUS2.0: Fast Clustering based on kernel density estimation[C]. IDA, 2007, LNCS 4723: 70-80.
  • 4Vineet C J, Mohammad A H, and Saeed S, et al.. SPARCL: Efficient and effective shape-based clustering[C]. Proceedings of 8th IEEE International Conference on Data Mining, Pisa, Italy, 2008: 93-102.
  • 5Tao P, Ajay J, and David J H, et al.. DECODE: A new method for discovering clusters of different densities in spatial data [J]. Data Mining and Knowledge Discovery, 2009, 18(3): 337-369.
  • 6Hans H P, Peer K, and Arthir Z. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering [J]. ACM Transactions on Knowledge Discovery from Data (TKDD), 2009, 3(1): 1-58.
  • 7Ng K, Fu A, and Wong C W. Projective clustering by histograms [J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(3): 369-383.
  • 8Moise G, Sander J, and Ester M. Robust projected clustering [J]. Knowledge Information System, 2008, (14): 273-298.
  • 9Moise G and Sander J. Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projective and subspace clustering[C]. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD,08) Lasvegas, 2008: 533-541.
  • 10Liu H, Lafferty J, and Wasserman L. Sparse nonparametric density estimation in high dimensions using the rodeo[C]. 11th International Conference on Artificial Intelligence and Statistics, AISTATS, Florida, 2007: 1049-1062.

二级参考文献1

共引文献1073

同被引文献18

  • 1贺志,田盛丰,黄厚宽.一种挖掘数值属性的二维优化关联规则方法(英文)[J].软件学报,2007,18(10):2528-2537. 被引量:5
  • 2Srikant R, Agrawal R. Mining quantitative association rules in large relational tables[C]. Montreal: in Proc. of the ACM SIGMOD International Conference on Management of Data, 1996, 25: 1-12.
  • 3Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases[C]. Washington: in Proc. of the ACM SIGMOD International Conference on Management of Data, 1993, 22:207-216.
  • 4Donoho David L. High-dimentional data analysis: The curses and blessings ofdimensionality [R]. Los Angeles, 2000.
  • 5Beyer K, Goldstein J, Ramakrishnan R, et al. When is nearest neighbors meaningful[C]. Jerusalem: in Proc. of the Intema- tional Conference Database Theories, 1999:217-235.
  • 6Agrawal R, Gehrke J, Gunopulos D, et al. Authomatic subspace clustering of high dimensional data for data mining applica- tions[C]. New York: Proc of ACM SIGMOD Intemational Conference on Management of Data, 1998, 27: 94-105.
  • 7Aggarwal C C, Yu P S. Finding generalized projected clusters in high dimensional spaces[C]. Dallas: Proc of ACM SIG- MOD International Conference on Management of Data, 2000, 29: 70-81.
  • 8Liu Qingbao, Dong Guozhu. CPCQ: Contrast pattern based clustering quality index for categorical data[J]. Pattern Recognition, 2012,45 (4): 1739-1748.
  • 9Poon Leonard K M, Zhang Nevin L, Liu Tengfei, et al. Mod- el-based clustering of high-dimensional data: Variable selec-tion versus facet determination[OL], http://www, sciencedi- rect.corn/science/article/pii/S0888613X 12001429. 2012.8.
  • 10Browne R P, McNicholas P D. Model-based clustering, clas- sification, and discriminant analysis of data with mixed type[J]. Journal of Statistical Planning and Inference, 2012, 142 (11): 2976-2984.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部