局部显著单元高维聚类算法被引量：1

High Dimensional Clustering Algorithm Based on Local Significant Units

下载PDF

导出

摘要以等宽或随机宽度网格密度单元为基础的高维聚类算法不能保证复杂数据集中的聚类结果的质量。该文在核密度估计和空间统计理论的基础上,给出一种基于局部显著单元的高维聚类算法来处理复杂数据的高维聚类问题。该方法以局部核密度估计和空间统计理论为基础定义了局部显著单元结构来捕获局部数据分布;设计了能快速发现覆盖数据分布的局部显著区域的贪婪算法;对具有相同属性子集的局部显著单元执行Single-linkage算法发现其中的聚类结果。实验结果表明,以局部显著单元为基础的高维聚类算法能够发现复杂数据集中隐含的高质量聚类结果。 High dimensional clustering algorithm based on equal or random width density grid cannot guarantee high quality clustering results in complicated data sets.In this paper,a High dimensional Clustering algorithm based on Local Significant Unit（HC_LSU） is proposed to deal with this problem,based on the kernel estimation and spatial statistical theory.Firstly,a structure,namely Local Significant Unit（LSU） is introduced by local kernel density estimation and spatial statistical test;secondly,a greedy algorithm named Greedy Algorithm for LSU（GA_LSU） is proposed to quickly find out the local significant units in the data set;and eventually,the single-linkage algorithm is run on the local significant units with the same attribute subset to generate the clustering results.Experimental results on 4 synthetic and 6 real world data sets showed that the proposed high-dimensional clustering algorithm,HC_LSU,could effectively find out high quality clustering results from the highly complicated data sets.

作者宗瑜李明楚徐贯东张彦春

机构地区大连理工大学软件学院维多利亚大学信息应用中心

出处《电子与信息学报》 EI CSCD 北大核心 2010年第11期2707-2712,共6页 Journal of Electronics & Information Technology

基金国家自然科学重点基金(90715037) 国家973计划项目(2007CB714205) 澳大利亚ARC项目(DP0770479) 安徽省教育厅重点项目(KJ2009A54 KJ2010A325)资助课题

关键词聚类分析高维聚类算法核密度估计局部显著单元 Clustering analysis High dimensional Clustering（HC） algorithm Kernel density estimation Local Significant Unit（LSU）

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献16

1孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量：1074
2Hinneburg A and Keim D A. An efficient approach to clustering in large multimedia databases with noise [C]. Processing of the 4th International Conference on Knowledge Discovery and Data Mining, New York: AAAI Press, 1998:58-68.
3Hinneburg A and Gabriel H H. DENCLUS2.0: Fast Clustering based on kernel density estimation[C]. IDA, 2007, LNCS 4723: 70-80.
4Vineet C J, Mohammad A H, and Saeed S, et al.. SPARCL: Efficient and effective shape-based clustering[C]. Proceedings of 8th IEEE International Conference on Data Mining, Pisa, Italy, 2008: 93-102.
5Tao P, Ajay J, and David J H, et al.. DECODE: A new method for discovering clusters of different densities in spatial data [J]. Data Mining and Knowledge Discovery, 2009, 18(3): 337-369.
6Hans H P, Peer K, and Arthir Z. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering [J]. ACM Transactions on Knowledge Discovery from Data (TKDD), 2009, 3(1): 1-58.
7Ng K, Fu A, and Wong C W. Projective clustering by histograms [J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(3): 369-383.
8Moise G, Sander J, and Ester M. Robust projected clustering [J]. Knowledge Information System, 2008, (14): 273-298.
9Moise G and Sander J. Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projective and subspace clustering[C]. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD,08) Lasvegas, 2008: 533-541.
10Liu H, Lafferty J, and Wasserman L. Sparse nonparametric density estimation in high dimensions using the rodeo[C]. 11th International Conference on Artificial Intelligence and Statistics, AISTATS, Florida, 2007: 1049-1062.

二级参考文献1

1李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量：114

共引文献1073

1丁小军,陈杰,李霖,徐碧通,朱晓姝.一种基于聚类结果稳定性来确定聚类数的方法[J].玉林师范学院学报,2020(3):43-47. 被引量：1
2王玥,李文权,梁爽,余静财.基于改进聚类算法的共享汽车网点选址研究[J].武汉理工大学学报,2021,43(2):79-85.
3林耿堃,盛积良.乡村振兴时代背景下农民消费结构变迁研究[J].农业农村部管理干部学院学报,2021(2):76-81. 被引量：3
4高显义,林欣晖.基于文本聚类的变电工程变更特征识别研究[J].建筑经济,2020,41(S02):200-203. 被引量：2
5毛颖颖,杨新凯.融合拓扑势的自适应层次聚类算法研究[J].计算机应用研究,2020,37(S01):37-39.
6张睿恺,吴克河.基于优化特征集的LeNet-5攻击检测模型的态势感知技术[J].计算机应用研究,2020,37(S01):287-289. 被引量：3
7李对红,王裴岩 ,张桂平,张少阳.基于字簇的多模型中文分词方法研究[J].计算机应用研究,2020,37(2):355-359. 被引量：2
8尧少波,蒋励剑,赵文文,卢铮,吴昌聚,陈伟芳.耦合聚类的数据驱动稀薄流非线性本构计算方法[J].航空学报,2022,43(S02):43-56.
9段桂芹.基于改进密度的簇内均值最小距离聚类算法[J].智能计算机与应用,2021,11(12):82-86. 被引量：1
10何睿,余娜,李淼,张峻巍,王浩杰,赵玉茗.基于单细胞RNA测序数据的细胞类型聚类算法[J].智能计算机与应用,2020,10(7):104-108. 被引量：2

同被引文献18

1贺志,田盛丰,黄厚宽.一种挖掘数值属性的二维优化关联规则方法(英文)[J].软件学报,2007,18(10):2528-2537. 被引量：5
2Srikant R, Agrawal R. Mining quantitative association rules in large relational tables[C]. Montreal: in Proc. of the ACM SIGMOD International Conference on Management of Data, 1996, 25: 1-12.
3Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases[C]. Washington: in Proc. of the ACM SIGMOD International Conference on Management of Data, 1993, 22:207-216.
4Donoho David L. High-dimentional data analysis: The curses and blessings ofdimensionality [R]. Los Angeles, 2000.
5Beyer K, Goldstein J, Ramakrishnan R, et al. When is nearest neighbors meaningful[C]. Jerusalem: in Proc. of the Intema- tional Conference Database Theories, 1999:217-235.
6Agrawal R, Gehrke J, Gunopulos D, et al. Authomatic subspace clustering of high dimensional data for data mining applica- tions[C]. New York: Proc of ACM SIGMOD Intemational Conference on Management of Data, 1998, 27: 94-105.
7Aggarwal C C, Yu P S. Finding generalized projected clusters in high dimensional spaces[C]. Dallas: Proc of ACM SIG- MOD International Conference on Management of Data, 2000, 29: 70-81.
8Liu Qingbao, Dong Guozhu. CPCQ: Contrast pattern based clustering quality index for categorical data[J]. Pattern Recognition, 2012,45 (4): 1739-1748.
9Poon Leonard K M, Zhang Nevin L, Liu Tengfei, et al. Mod- el-based clustering of high-dimensional data: Variable selec-tion versus facet determination[OL], http://www, sciencedi- rect.corn/science/article/pii/S0888613X 12001429. 2012.8.
10Browne R P, McNicholas P D. Model-based clustering, clas- sification, and discriminant analysis of data with mixed type[J]. Journal of Statistical Planning and Inference, 2012, 142 (11): 2976-2984.

引证文献1

1王珊珊,梁同乐.一种改进的多值属性模式聚类算法[J].自动化与信息工程,2015,36(5):33-39.

1黄斯达,陈启买.基于相似性度量的高维聚类算法的研究[J].微计算机信息,2009,25(27):187-188. 被引量：4
2冯永,吴开贵,熊忠阳,吴中福.一种有效的并行高维聚类算法[J].计算机科学,2005,32(3):216-218. 被引量：6
3陈云开,卢正鼎,刘芳,郭洁.一种高维聚类算法及在洗钱侦测中的应用[J].计算机科学,2007,34(6):191-193. 被引量：5
4刘勘,周晓峥,周洞汝.一种基于排序子空间的高维聚类算法及其可视化研究[J].计算机研究与发展,2003,40(10):1509-1513. 被引量：3
5李隘优.自动连结链聚类算法[J].延边大学学报（自然科学版）,2015,41(3):254-256.
6冯永,钟将,熊忠阳,叶春晓,吴中福.一种自底向上的高维聚类算法[J].重庆大学学报（自然科学版）,2006,29(9):106-110.
7郏宣耀.基于相似性二次度量的高维数据聚类算法[J].计算机应用,2005,25(B12):176-177. 被引量：3
8顾冬娟,戴浩.改进的基于密度和网格的高维聚类算法[J].科技创新导报,2008,5(22):29-29.
9朱倩,黄志军.一种改进的基于密度和网格的高维聚类算法[J].舰船电子工程,2005,25(5):55-56. 被引量：5
10于勇前,赵相国,王国仁,陈衡岳.一种基于密度单元的自扩展聚类算法[J].控制与决策,2006,21(9):974-978. 被引量：7

电子与信息学报

2010年第11期

浏览历史

内容加载中请稍等...

局部显著单元高维聚类算法被引量：1

参考文献16

二级参考文献1

共引文献1073

同被引文献18

引证文献1

相关作者

相关机构

相关主题

浏览历史

局部显著单元高维聚类算法 被引量：1

参考文献16

二级参考文献1

共引文献1073

同被引文献18

引证文献1

相关作者

相关机构

相关主题

浏览历史

局部显著单元高维聚类算法被引量：1