摘要
提出了一种称为"异构树"的数据结构,采用一套编号规则对异构树的分支进行编号,使具有相同编号的分支代表相同的候选序列,编号不同的分支代表不同的候选序列,极大地简化了候选集计数过程.在此基础上提出了具有增量挖掘功能的序列模式高效挖掘算法NPSP,并从理论分析和实验两方面证明了其挖掘结果集的完备性和算法的高效性.
The GSP and the PSP are the main two algorithms for mining sequential patterns.But neither of those algorithms has the function of incremental data mining and their efficiency is lower.In this paper,a data structure called Heterogeneity Tree is presented and a set of rules is used to number the branches of the Heterogeneity Tree.The rules ensure that the branches which have the same serial numbers represent the same candidates and the branches which have different serial numbers represent different candidates so that the process of counting the support of candidates is simplified.Based on those,an efficient algorithm with the function of incremental data mining for mining sequential patterns is obtained.Finally the completeness of the mined set and efficiency of the algorithm NPSP by theories and experiment are proved.
出处
《广西师范大学学报(自然科学版)》
CAS
2004年第4期22-26,共5页
Journal of Guangxi Normal University:Natural Science Edition
基金
澳大利亚ARC基金资助项目(DP0343109)
关键词
数据挖掘
序列模式
NPSP算法
增量挖掘
data mining
sequence patterns
NPSP algorithm
incremental data mining