摘要
针对序列模式挖掘算法PrefixSpan在挖掘过程中需要构造大量投影数据库的不足,提出IPMSP算法,在递归挖掘过程中,通过检查序列数据库关于前缀的前缀,避免对同一频繁前缀模式构造重复投影数据库,同时舍弃对非频繁项的存储并在投影序列数小于最小支持度时停止扫描投影数据库,从而提高PrefixSpan算法的时空性能。实验结果证明,IPMSP算法在时间和空间性能上优于PrefixSpan算法。
Aiming at the PrefixSpan algorithm produce huge amount of project databases in mining sequence patterns, this paper proposes an Improved PrefixSpan algorithm for Mining Sequential Patterns(IPMSP) alaorithm. By avoid produce duplicated project databases with the same prefix pattern through checking the prefix with regard to prefix of the sequence database and abnegating the non-frequent items and project databases which sequential number is lower than minimum support in the recursive mining process, the performance of Pref'ixSpan is well improved. Experiment results shows that the time and space performance of IPMSP algorithm are better than that of PrefixSpan.
出处
《计算机工程》
CAS
CSCD
北大核心
2009年第23期56-58,61,共4页
Computer Engineering