摘要
文章研究了利用序列模式的挖掘结果对序列数据库进行再发现的问题,提出一种利用已发现序列模式对数据库中的数据序列进行聚类的方法SPSC。该方法利用发现的序列模式定义了数据序列之间相似度函数和数据序列分组的平均值,使得经典聚类方法k-means可以应用于序列型数据,实现了对包含相似模式的数据序列进行聚类;理论分析和实验表明,与已有的序列聚类方法相比,该文所提出的方法不仅可以得到更加优化的聚类,而且效率更高。
The paper deals with the problem of farther discovering in the sequence database on the basis of the results of sequential pattern mining, and a sequence clustering method using sequential patterns achieved is proposed. The definition of the similarity of data sequences and the mean of the data sequence cluster are given, so that the k-means method can be applied to the sequence data and a set of high quality data sequence clusters with similar sequential patterns can be discovered. Theoretic analysis and experiments prove that the method not only generates optimal clusters but also exhibits good efficiency.
出处
《合肥工业大学学报(自然科学版)》
CAS
CSCD
北大核心
2008年第1期9-12,共4页
Journal of Hefei University of Technology:Natural Science
基金
安徽省自然科学基金资助项目(050420207)
合肥工业大学科研发展基金资助项目(050504F)
关键词
数据挖掘
序列模式
聚类
data mining
sequential pattern
cluster