摘要
针对PrefixSpan算法构造投影数据库开销大的问题,提出一种基于改进PrefixSpan的序列模式挖掘算法SPMIP。该方法通过添加剪枝步和减少某些特定序列模式生成过程的扫描,来减少投影数据库的规模及扫描投影数据库的时间,提高算法效率,并最终得到需要的序列模式。实验结果证明在获得序列模式不受影响情况下,SPMIP算法比PrefixSpan算法效率更高。
PrefixSpan, the classic sequential patterns mining algorithm, has the problem of producing huge amount of project databases. To solve this problem, a sequential patterns mining algorithm named SPMIP was proposed based on an improved PrefixSpan. This algorithm reduced the scale of projected databases and the time of scanning projected databases through adding pruning step and reducing scanning of certain specific sequential patterns production. In this way, algorithm efficiency could be raised up, and the needed sequential patterns were obtained. The experimental results show that SPMIP is more efficient than PrefixSpan while obtained sequential patterns have not been affected.
出处
《计算机应用》
CSCD
北大核心
2011年第9期2405-2407,共3页
journal of Computer Applications
基金
国家自然科学基金资助项目(60873247)
山东省高新自主创新专项工程资助项目(2008ZZ28)
山东省自然科学基金资助项目(ZR2009GZ007)
山东省教育厅科技计划项目(J09LG52)