期刊文献+

多维数据集中聚类数确定算法研究 被引量:2

Research on Determinition Algorithm of Clustering Number in Multi-dimensional Dataset
下载PDF
导出
摘要 在传统确定数据集聚类数算法原理的基础上,提出一种新的算法——MHC算法。该算法采用自底向上的策略生成不同层次的数据集划分,计算每个层次的聚类划分质量,通过聚类质量选择最佳的聚类数。还设计一种新的有效性指标——BIP指标,用于衡量不同划分的聚类质量,该指标主要依托数据集的几何结构。实验结果表明,该算法能准确地确定多维数据集中的最佳聚类数。 In order to better determine the optimal clustering number for multi-dimensional data, this paper proposes an new algorithm MHC, which is based on the principle of the traditional algorithm to determine the clustering number for the dataset. This algorithm adopts bottom-up method to generate dataset partition of different levels. In every division, the algorithm automatically generates the partition of clustering quality, and chooses the optimal clustering number by the clustering quality. Additionally, it still presents a new clustering validity index Between-In- Proportion(BIP), which is used to measure the different division of clustering quality, and mainly depends on the geometrical structure of datasets. Theoretical analysis and experimental results verify the effectiveness and good performance of the new validity index and the MHC algorithm.
出处 《计算机工程》 CAS CSCD 2012年第9期8-11,共4页 Computer Engineering
基金 国家"863"计划基金资助重点项目(2007AA010305) 陕西省自然科学基础研究计划基金资助项目(SJ08-ZT14) 陕西省教育厅科学研究计划基金资助项目(06JK229 09JK683)
关键词 多维数据集 聚类数 聚类有效性指标 层次聚类 multi-dimensional dataset clustering number clustering validity indicator hierarchy clustering
  • 相关文献

参考文献11

  • 1吴玉霞,牟援朝.基于两阶段聚类的洗钱行为识别[J].计算机工程,2010,36(15):60-62. 被引量:5
  • 2Xie X L,Beni G.A Validity Measure for Fuzzy Clustering[J].IEEE Trans.on Pattern Analysis and Machine Intelligence,1991,13(8):841-847.
  • 3Sun Haojun,Wang Shengrui,Jiang Qingshan.FCM-based ModelSelection Algorithms for Determining the Number of Clusters[J].Pattern Recognition,2004,37(10):2027-2037.
  • 4Kapp A V,Tibshirani R.Are Clusters Found in One DatasetPresent in Another Dataset?[J].Biostatistics,2007,8(1):9-31.
  • 5Woo K G,Lee J H,Kim M H,et al.FINDIT:A Fast and IntelligentSubspace Clustering Algorithm Using Dimension Voting[J].Information and Software Technology,2004,46(4):255-271.
  • 6陈黎飞,姜青山,王声瑞.基于层次划分的最佳聚类数确定方法[J].软件学报,2008,19(1):62-72. 被引量:82
  • 7周世兵,徐振源,唐旭清.K-means算法最佳聚类数确定方法[J].计算机应用,2010,30(8):1995-1998. 被引量:142
  • 8Foss A,Zaiane O R.A Parameterless Method for EfficientlyDiscovering Clusters of Arbitrary Shape in Large Datasets[C]//Proc.of ICDM’02.Los Alamitos,USA:IEEE Computer SocietyPress,2002:179-186.
  • 9Agrawal R,Gehrke J,Gunopulos D,et al.Automate SubspaceClustering of High Dimensional Data[J].Data Mining andKnowledge Discovery,2005,11(1):5-33.
  • 10洪志令 ,姜青山 ,董槐林 ,Wang Sheng-Rui .模糊聚类中判别聚类有效性的新指标[J].计算机科学,2004,31(10):121-125. 被引量:15

二级参考文献27

  • 1洪志令 ,姜青山 ,董槐林 ,Wang Sheng-Rui .模糊聚类中判别聚类有效性的新指标[J].计算机科学,2004,31(10):121-125. 被引量:15
  • 2诸克军,苏顺华,黎金玲.模糊C-均值中的最优聚类与最佳聚类数[J].系统工程理论与实践,2005,25(3):52-61. 被引量:69
  • 3汤俊.基于客户行为模式识别的反洗钱数据监测与分析体系[J].中南财经政法大学学报,2005(4):62-67. 被引量:31
  • 4李文超,周勇,夏士雄.一种新的基于层次和K-means方法的聚类算法[C].第26届中国控制会议论文集,2007.
  • 5CALINSKI R,HARABASZ J.A dendrite method for cluster analysis[J].Communications in Statistics,1974,3(1):1 -27.
  • 6DAVIES D L,BOULDIN D W.A cluster separation measure[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1979,1(2):224-227.
  • 7DUDOIT S,FRIDLYAND J.A prediction-based resampling method for estimating the number of clusters in a dataset[J].Genome Biology,2002,3(7):1-21.
  • 8DIMITRIADOU E,DOLNICAR S,WEINGESSEL A.An examination of indexes for determining the number of cluster in binary data sets[J].Psychometrika,2002,67(1):137-160.
  • 9KAPP A V,TIBSHIRANI R.Are clusters found in one dataset present in another dataset?[J].Biostatistics,2007,8(1):9-31.
  • 10ROUSSEEUW P J.Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J].Journal of Computational and Applied Mathematics,1987,20(1):53 -65.

共引文献361

同被引文献16

  • 1诸克军,苏顺华,黎金玲.模糊C-均值中的最优聚类与最佳聚类数[J].系统工程理论与实践,2005,25(3):52-61. 被引量:69
  • 2仵博,吴敏.部分可观察马尔可夫决策过程研究进展[J].计算机工程与设计,2007,28(9):2116-2119. 被引量:3
  • 3金阳,左万利.一种基于动态近邻选择模型的聚类算法[J].计算机学报,2007,30(5):756-762. 被引量:18
  • 4Kurniawati H, Hsu D, Lee W S. SARSOP: Efficient Point- based POMDP Planning by Approximating Optimally Reachable Belief Spaces[C]//Proc. of Robotics: Science and Systems. Zurich, Switzerland: MIT Press, 2008.
  • 5Ross S, Pineau J, Paquet S, et al. Online Planning .lgorithms for POMDPs[J]. Journal of Artificial Intelligence Research, 2008, 32(1): 663-704.
  • 6He Ruijie, Brunskill E, Roy N. Efficient Planning Under Uncertainty with Macro-actions[J]. Journal of ArtificialIntelligence Research, 2011, 40(1): 523-570.
  • 7Boyen X, Koller D. Tractable Inference for Complex Stochastic Processes[C]//Proc. of the 14th Conference on Uncertainty in Artificial Intelligence. Madison, USA: Morgan Kaufmann Press, 1998.
  • 8Cohn R, Durfee E, Singh S. Planning Delayed-response Queries and Transient Policies Under Reward Uncertainty[C]//Proc. of the 7th Annual Workshop on Multiagent Sequential Decision-making Under Uncertainty. Valencia, Spain: ACM Press, 2012.
  • 9Andrieu C, Doucet A, Holenstein R. Particle Markov Chain Monte Carlo Methods[J]. Journal of the Royal Statistical Society: Series B, 2010, 72(3): 269-342.
  • 10Kwok C, Fox D, Meila M. Real-time Particle Filters[J]. Proceedings of the IEEE, 2004, 92(3): 469-484.

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部