期刊文献+

聚类初始中心点选取研究 被引量:2

Research of Clustering Initial Center Selection
下载PDF
导出
摘要 研究了利用已发现的频繁序列模式对序列数据库进行再聚类再发现的问题,针对已有的K-均值聚类算法随机选取初始中心点而导致聚类结果不稳定性的缺点,提出了一种基于Huffman思想的初始中心点选取算法——K-SPAM(K-means algorithm of sequence pattern mining based on the Huffman Method)算法.该算法能够在一定程度上减少陷入局部最优的可能,而且对序列间相似度的计算采用一种高效的"与"、"或"运算,可极大提高算法的执行效率. The paper studied the problem of reclustering and rediscovering in the sequence database on the basis of the results of sequential pattern mining. Aiming at this shortcoming that it could lead to the instability of clustering results to select randomly the initial focal points in the existing K-means clustering algorithm, an initial center selection algorithm named K-SPAM ( K-means algorithm of sequence pattern mining based on the Huffman Method) algorithm was proposed. It was based on Huffman idea. The algorithm could reduce probability of local optimum to a certain extent. Moreover, a highly efficient "and" and "or" operators were adopted to calculate similarity between pairs of sequences. To do so could greatly improve the execution efficiency of the algorithm.
出处 《南京师大学报(自然科学版)》 CAS CSCD 北大核心 2010年第4期161-165,共5页 Journal of Nanjing Normal University(Natural Science Edition)
基金 西北师范大学2006-2010年度重点学科基金(2007C04)
关键词 K-均值 序列模式 HUFFMAN树 聚类 初始中心 K-means, sequential patterns, Huffman tree, clustering, initial center
  • 相关文献

参考文献7

  • 1Agrawal A,Srikant R.Mining sequential patterns[C]//Taipei:Proc of the 11st Int Conf on Data Engineering,1995:3-14.
  • 2Kaufman L,Roueeeuw P J.Finding Groups in Data:An Introduction to Cluster Analysis[M].New York:John Wiley & Sons,1990.
  • 3Morzy T,Wojciechowski M,Zakrzewicz M.Scalable hierar-chical clustering method for sequences of categorical values[C]//Proc of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PA KDD),Lecture Notes in Computer Science 2035.New York:Springer-Verlag,2001:282-293.
  • 4Ayres J,Gehrkeetal J.Sequential pattern mining using a bitmap representation[C]//Proc of the 8th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.Edmonton,2002:429-435.
  • 5严蔚敏,吴伟民.数据结构[M].北京:清华大学出版社,2007.
  • 6UCI数据集[DB/OL].[2008-03-13].http://download.csdn.net/source/378926.
  • 7IBM Almaden Research Center.Quest Data Mining Project[DB/OL].(1996-03-12)[2007-05-26].http://www.almaden.ibm.com/cs/quest/syndata.html.

共引文献45

同被引文献14

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部