摘要
研究了利用已发现的频繁序列模式对序列数据库进行再聚类再发现的问题,针对已有的K-均值聚类算法随机选取初始中心点而导致聚类结果不稳定性的缺点,提出了一种基于Huffman思想的初始中心点选取算法——K-SPAM(K-means algorithm of sequence pattern mining based on the Huffman Method)算法.该算法能够在一定程度上减少陷入局部最优的可能,而且对序列间相似度的计算采用一种高效的"与"、"或"运算,可极大提高算法的执行效率.
The paper studied the problem of reclustering and rediscovering in the sequence database on the basis of the results of sequential pattern mining. Aiming at this shortcoming that it could lead to the instability of clustering results to select randomly the initial focal points in the existing K-means clustering algorithm, an initial center selection algorithm named K-SPAM ( K-means algorithm of sequence pattern mining based on the Huffman Method) algorithm was proposed. It was based on Huffman idea. The algorithm could reduce probability of local optimum to a certain extent. Moreover, a highly efficient "and" and "or" operators were adopted to calculate similarity between pairs of sequences. To do so could greatly improve the execution efficiency of the algorithm.
出处
《南京师大学报(自然科学版)》
CAS
CSCD
北大核心
2010年第4期161-165,共5页
Journal of Nanjing Normal University(Natural Science Edition)
基金
西北师范大学2006-2010年度重点学科基金(2007C04)