多维数据集中聚类数确定算法研究被引量：2

Research on Determinition Algorithm of Clustering Number in Multi-dimensional Dataset

下载PDF

导出

摘要在传统确定数据集聚类数算法原理的基础上,提出一种新的算法——MHC算法。该算法采用自底向上的策略生成不同层次的数据集划分,计算每个层次的聚类划分质量,通过聚类质量选择最佳的聚类数。还设计一种新的有效性指标——BIP指标,用于衡量不同划分的聚类质量,该指标主要依托数据集的几何结构。实验结果表明,该算法能准确地确定多维数据集中的最佳聚类数。 In order to better determine the optimal clustering number for multi-dimensional data, this paper proposes an new algorithm MHC, which is based on the principle of the traditional algorithm to determine the clustering number for the dataset. This algorithm adopts bottom-up method to generate dataset partition of different levels. In every division, the algorithm automatically generates the partition of clustering quality, and chooses the optimal clustering number by the clustering quality. Additionally, it still presents a new clustering validity index Between-In- Proportion（BIP）, which is used to measure the different division of clustering quality, and mainly depends on the geometrical structure of datasets. Theoretical analysis and experimental results verify the effectiveness and good performance of the new validity index and the MHC algorithm.

作者周红芳李红岩刘颖王晓东

机构地区西安理工大学计算机科学与工程学院攀枝花学院计算机学院解放军防空兵指挥学院

出处《计算机工程》 CAS CSCD 2012年第9期8-11,共4页 Computer Engineering

基金国家"863"计划基金资助重点项目(2007AA010305) 陕西省自然科学基础研究计划基金资助项目(SJ08-ZT14) 陕西省教育厅科学研究计划基金资助项目(06JK229 09JK683)

关键词多维数据集聚类数聚类有效性指标层次聚类 multi-dimensional dataset clustering number clustering validity indicator hierarchy clustering

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1吴玉霞,牟援朝.基于两阶段聚类的洗钱行为识别[J].计算机工程,2010,36(15):60-62. 被引量：5
2Xie X L,Beni G.A Validity Measure for Fuzzy Clustering[J].IEEE Trans.on Pattern Analysis and Machine Intelligence,1991,13(8):841-847.
3Sun Haojun,Wang Shengrui,Jiang Qingshan.FCM-based ModelSelection Algorithms for Determining the Number of Clusters[J].Pattern Recognition,2004,37(10):2027-2037.
4Kapp A V,Tibshirani R.Are Clusters Found in One DatasetPresent in Another Dataset?[J].Biostatistics,2007,8(1):9-31.
5Woo K G,Lee J H,Kim M H,et al.FINDIT:A Fast and IntelligentSubspace Clustering Algorithm Using Dimension Voting[J].Information and Software Technology,2004,46(4):255-271.
6陈黎飞,姜青山,王声瑞.基于层次划分的最佳聚类数确定方法[J].软件学报,2008,19(1):62-72. 被引量：82
7周世兵,徐振源,唐旭清.K-means算法最佳聚类数确定方法[J].计算机应用,2010,30(8):1995-1998. 被引量：142
8Foss A,Zaiane O R.A Parameterless Method for EfficientlyDiscovering Clusters of Arbitrary Shape in Large Datasets[C]//Proc.of ICDM’02.Los Alamitos,USA:IEEE Computer SocietyPress,2002:179-186.
9Agrawal R,Gehrke J,Gunopulos D,et al.Automate SubspaceClustering of High Dimensional Data[J].Data Mining andKnowledge Discovery,2005,11(1):5-33.
10洪志令 ,姜青山 ,董槐林 ,Wang Sheng-Rui .模糊聚类中判别聚类有效性的新指标[J].计算机科学,2004,31(10):121-125. 被引量：15

二级参考文献27

1洪志令 ,姜青山 ,董槐林 ,Wang Sheng-Rui .模糊聚类中判别聚类有效性的新指标[J].计算机科学,2004,31(10):121-125. 被引量：15
2诸克军,苏顺华,黎金玲.模糊C-均值中的最优聚类与最佳聚类数[J].系统工程理论与实践,2005,25(3):52-61. 被引量：69
3汤俊.基于客户行为模式识别的反洗钱数据监测与分析体系[J].中南财经政法大学学报,2005(4):62-67. 被引量：31
4李文超,周勇,夏士雄.一种新的基于层次和K-means方法的聚类算法[C].第26届中国控制会议论文集,2007.
5CALINSKI R,HARABASZ J.A dendrite method for cluster analysis[J].Communications in Statistics,1974,3(1):1 -27.
6DAVIES D L,BOULDIN D W.A cluster separation measure[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1979,1(2):224-227.
7DUDOIT S,FRIDLYAND J.A prediction-based resampling method for estimating the number of clusters in a dataset[J].Genome Biology,2002,3(7):1-21.
8DIMITRIADOU E,DOLNICAR S,WEINGESSEL A.An examination of indexes for determining the number of cluster in binary data sets[J].Psychometrika,2002,67(1):137-160.
9KAPP A V,TIBSHIRANI R.Are clusters found in one dataset present in another dataset?[J].Biostatistics,2007,8(1):9-31.
10ROUSSEEUW P J.Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J].Journal of Computational and Applied Mathematics,1987,20(1):53 -65.

共引文献361

1袁小翠,刘宝玲,马永力.基于空间邻域连通区域标记法的点云离群点检测[J].计算机应用研究,2020,37(S02):380-382. 被引量：6
2徐艺萍,邓辉文,李阳旭.一种新的最近邻聚类算法[J].西南师范大学学报（自然科学版）,2006,31(6):114-116. 被引量：8
3瞿俊,姜青山,Wang Shengrui,董槐林.基于高斯混合模型的层次聚类算法[J].计算机研究与发展,2006,43(z3):321-327. 被引量：3
4徐艺萍,邓辉文,徐永刚.一种改进的模糊C—均值聚类算法[J].徐州工程学院学报,2008(4):34-36. 被引量：2
5黄仁,冯阿瑞.基于Ncut的自适应图像分割方法[J].土木建筑与环境工程,2013,35(S2):107-110. 被引量：2
6张莉,孙钢,郭军.基于K-均值聚类的无监督的特征选择方法[J].计算机应用研究,2005,22(3):23-24. 被引量：29
7王海军,魏小鹏.面向规模化产品族的数值规划方法[J].计算机辅助设计与图形学学报,2005,17(3):473-478. 被引量：15
8张倩生.基于粗-模糊神经网络的决策控制[J].控制理论与应用,2005,22(2):330-334. 被引量：6
9张晓杰,王巍巍.基于C—均值模糊聚类的工程结构构件自动归并方法研究[J].四川建筑科学研究,2005,31(4):14-18. 被引量：2
10杨国权,蔡玉俊,庞俊忠.一种基于P-中位的产品单元化形成方法[J].华北工学院学报,2005,26(4):251-254.

同被引文献16

1诸克军,苏顺华,黎金玲.模糊C-均值中的最优聚类与最佳聚类数[J].系统工程理论与实践,2005,25(3):52-61. 被引量：69
2仵博,吴敏.部分可观察马尔可夫决策过程研究进展[J].计算机工程与设计,2007,28(9):2116-2119. 被引量：3
3金阳,左万利.一种基于动态近邻选择模型的聚类算法[J].计算机学报,2007,30(5):756-762. 被引量：18
4Kurniawati H, Hsu D, Lee W S. SARSOP: Efficient Point- based POMDP Planning by Approximating Optimally Reachable Belief Spaces[C]//Proc. of Robotics: Science and Systems. Zurich, Switzerland: MIT Press, 2008.
5Ross S, Pineau J, Paquet S, et al. Online Planning .lgorithms for POMDPs[J]. Journal of Artificial Intelligence Research, 2008, 32(1): 663-704.
6He Ruijie, Brunskill E, Roy N. Efficient Planning Under Uncertainty with Macro-actions[J]. Journal of ArtificialIntelligence Research, 2011, 40(1): 523-570.
7Boyen X, Koller D. Tractable Inference for Complex Stochastic Processes[C]//Proc. of the 14th Conference on Uncertainty in Artificial Intelligence. Madison, USA: Morgan Kaufmann Press, 1998.
8Cohn R, Durfee E, Singh S. Planning Delayed-response Queries and Transient Policies Under Reward Uncertainty[C]//Proc. of the 7th Annual Workshop on Multiagent Sequential Decision-making Under Uncertainty. Valencia, Spain: ACM Press, 2012.
9Andrieu C, Doucet A, Holenstein R. Particle Markov Chain Monte Carlo Methods[J]. Journal of the Royal Statistical Society: Series B, 2010, 72(3): 269-342.
10Kwok C, Fox D, Meila M. Real-time Particle Filters[J]. Proceedings of the IEEE, 2004, 92(3): 469-484.

引证文献2

1仵博,吴敏.基于后验信念聚类的在线规划算法[J].计算机工程,2013,39(4):214-218.
2盛魁,马健.基于核密度估计的物联网聚类分析模型[J].控制工程,2018,25(6):1098-1102. 被引量：3

二级引证文献3

1瞿霞,华建祥.物联网环境下大数据流中有效信息过滤算法研究[J].软件导刊,2020,19(6):214-217. 被引量：5
2王雪蓉,万年红.云模式事件混沌关联特征提取的物联网大数据聚类算法[J].计算机应用研究,2021,38(2):391-397. 被引量：8
3李慧.基于配电物联网的通信组网及数据处理技术研究[J].自动化技术与应用,2022,41(12):112-115. 被引量：2

1吴乐南,何振亚.二值图像的多值DPCM/MHC快速压缩[J].通信学报,1993,14(3):67-72. 被引量：4
2刘一松,杨玉成.基于文本聚类和概念相似度的语义Web服务发现[J].计算机科学,2013,40(11):211-214. 被引量：7
3总是显示MHC不能在IE5．5上运行[J].电脑爱好者（普及版）,2010(A02):111-111.
4全新iCE40 Ultra Plus FPGA器件[J].今日电子,2017,0(1):95-95.
5陈艳.基于ARMLinux的JPEG图像解码器的实现[J].管理观察,2009(35):15-16.
6莱迪思半导体高效节能iCE40 UltraPlus FPGA器件[J].世界电子元器件,2016,0(12):26-26.
7袁霖,邹恒明,李战怀.一个面向OLAP的多维层次聚簇存储模式[J].计算机科学,2007,34(9):110-113. 被引量：1
8苗渤钰,熊庆国.无线传感器网络新技术进展研究[J].中国仪器仪表,2006(11):32-34. 被引量：3
9刘静,朱晓冬,李大伟.用于内容精确认证的脆弱水印方案[J].吉林大学学报（信息科学版）,2010,28(4):419-422. 被引量：1
10吴嘉慧.JPEG图像解码方案[J].现代计算机,2007,13(3):49-53. 被引量：12

计算机工程

2012年第9期

浏览历史

内容加载中请稍等...

多维数据集中聚类数确定算法研究被引量：2

参考文献11

二级参考文献27

共引文献361

同被引文献16

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

多维数据集中聚类数确定算法研究 被引量：2

参考文献11

二级参考文献27

共引文献361

同被引文献16

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

多维数据集中聚类数确定算法研究被引量：2