摘要
频繁闭序列模式惟一确定全体频繁序列模式,且规模小得多。传统的闭序列模式挖掘算法对每个频繁项目都进行扩展,往往会产生大量的非闭合序列。为解决这一问题,提出了一种新的基于包含索引的频繁闭序列模式挖掘算法,其主要思想是只对闭项集进行扩展,大大减少了非闭合序列的产生。首先,论证了闭序列模式只能由闭项集组成;其次,说明了如何利用包含索引来快速发现闭项集;最后,给出了一种深度优先的挖掘频繁闭序列模式的新算法。实验结果表明,该算法具有较高的效率。
The set of frequent closed sequential pattern determines exactly the complete set of all frequent sequential patterns and is usually much smaller than the latter. Traditional closed sequential pattern mining algorithms extend a frequent sequence with every frequent single item, which leads to the generation of a lot of non-closed sequence. To solve these problems, a new mining algorithm for frequent closed sequential pattern based on subsume index is proposed. The main idea of the proposed algorithm is to extend a frequent sequence with closed itemsets only. Thus, the generation of non-closed sequences is avoided greatly. Firstly, it is proved that a closed sequential pattern is only composed of closed itemsets. Then, it is explained that the closed item sets can be discovered efficiently by using a subsume index. Finally, a depth-first algorithm for mining frequent closed sequential pattern is presented. The experimental results show that the proposed algorithm is efficient.
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2009年第10期2485-2488,共4页
Systems Engineering and Electronics
基金
国家自然科学基金(60675030)
北京市属市管高等学校人才强教计划资助课题
关键词
数据挖掘
频繁闭项集
频繁闭序列模式
包含索引
data mining
frequent closed itemset
frequent closed sequence pattern
subsume index