基于包含索引的频繁闭序列模式挖掘的新算法被引量：1

New mining algorithm for frequent closed sequential pattern based on subsume index

下载PDF

导出

摘要频繁闭序列模式惟一确定全体频繁序列模式,且规模小得多。传统的闭序列模式挖掘算法对每个频繁项目都进行扩展,往往会产生大量的非闭合序列。为解决这一问题,提出了一种新的基于包含索引的频繁闭序列模式挖掘算法,其主要思想是只对闭项集进行扩展,大大减少了非闭合序列的产生。首先,论证了闭序列模式只能由闭项集组成;其次,说明了如何利用包含索引来快速发现闭项集;最后,给出了一种深度优先的挖掘频繁闭序列模式的新算法。实验结果表明,该算法具有较高的效率。 The set of frequent closed sequential pattern determines exactly the complete set of all frequent sequential patterns and is usually much smaller than the latter. Traditional closed sequential pattern mining algorithms extend a frequent sequence with every frequent single item, which leads to the generation of a lot of non-closed sequence. To solve these problems, a new mining algorithm for frequent closed sequential pattern based on subsume index is proposed. The main idea of the proposed algorithm is to extend a frequent sequence with closed itemsets only. Thus, the generation of non-closed sequences is avoided greatly. Firstly, it is proved that a closed sequential pattern is only composed of closed itemsets. Then, it is explained that the closed item sets can be discovered efficiently by using a subsume index. Finally, a depth-first algorithm for mining frequent closed sequential pattern is presented. The experimental results show that the proposed algorithm is efficient.

作者李晋宏杨炳儒宋威侯伟

机构地区北京科技大学信息工程学院北方工业大学信息工程学院

出处《系统工程与电子技术》 EI CSCD 北大核心 2009年第10期2485-2488,共4页 Systems Engineering and Electronics

基金国家自然科学基金(60675030) 北京市属市管高等学校人才强教计划资助课题

关键词数据挖掘频繁闭项集频繁闭序列模式包含索引 data mining frequent closed itemset frequent closed sequence pattern subsume index

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献14

1Dong G, Pei J. Sequence data mining[M]. NewYork : Springer, 2007.
2Han J, Cheng H, Xin D, et al. Frequent pattern mining: current status and future directions[J]. Data Mining and Knowledge Discovery, 2007, 15(1): 55- 86.
3Agrawal R, Srikant R. Mining sequential patterns[C]//Proc. of the llth International Conference on Data Engineering, 1995: 3-14.
4Pei J, Han J, Mortazavi-Asl B, et al. Mining sequential patterns by pattern growth : the PrefixSpan approach [J]. IEEE Trans. on Knowledge and Data Engineering, 2004, 16(11):1424 - 1440.
5Zaki M J. SPADE: an efficient algorithm for mining frequent se quences[J]. Machine Learning, 2001, 42 (1/2) : 31 - 60.
6Yah X, Han J, Afshar R. CloSpan: mining closed sequential patterns in large databases[C]//Proc, of the 3rd SIAM International Conference on Data Mining, 2003 : 166 - 177.
7Wang J, Han J, Li C. Frequent closed sequence mining without candidate maintenance[J]. IEEE Trans. on Knowledge and Data Engineering, 2007, 19(8) :1042-1056.
8叶飞跃.基于自适应哈希链的分布式频繁模式挖掘算法[J].系统工程与电子技术,2005,27(3):560-564. 被引量：2
9陈慧萍,王建东,叶飞跃,王煜.基于FP-tree和支持度数组的最大频繁项集挖掘算法[J].系统工程与电子技术,2005,27(9):1631-1635. 被引量：2
10Yang G. Computational aspects of mining maximal frequent patterns[J]. Theoretical Computer Science, 2006, 362 (1 - 3) : 63 - 85.

二级参考文献26

1叶飞跃,王建东,庄毅,吕宗磊.一种挖掘频繁模式的数据库划分新方法[J].系统工程与电子技术,2004,26(11):1666-1668. 被引量：3
2秦亮曦,史忠植.SFPMax——基于排序FP树的最大频繁模式挖掘算法[J].计算机研究与发展,2005,42(2):217-223. 被引量：26
3宋余庆,朱玉全,孙志挥,杨鹤标.一种基于频繁模式树的约束最大频繁项目集挖掘及其更新算法[J].计算机研究与发展,2005,42(5):777-783. 被引量：21
4Agrawal R, Srikant R. Fast algorithms for mining association rules[A].VLDB[C], 1994. 487-499.
5Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation[A]. SIGMOD[C], 2000. 1- 12.
6Pei J, Han J, Lu H, et al. H-Mine: hyper-structure mining of frequent in large database[A]. Proc. Int. Conf. on Data Mining[C], 2001. 38.
7Park J S, Chen M S, Yu P S. Efficient parallel mining for association rules [ A ]. Proc. 4th Int. Conf. on information and Knowledge Management[C]. Baltimore, Maryland, 1995. 31-36.
8Agrawal R, Shafer J C. Parallel mining of association rules: design,implementation, and experience[ J]. IEEE Trans. Knowledge and Data Engineering, 1996. 962 - 969.
9Cheung David W, Han Jiawei, Ng Vincent T, et al. A fast distributed algorithm for mining association rules[A]. Proc. of 4th Int. Conf. on Parallel and Distributed Information Systems[ C], Miami Beach, Florida,December, 1996.31 - 43.
10Agrawal R, Imielinski T, Swami A N. Mining association rules between sets of items in large databases[A]. In P. Buneman and S.Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data[C]. SIGMOD Record,ACMPress, 1993, 22(2): 207- 216.

共引文献12

1叶飞跃,吴访升,王建东.分布式系统中的元挖掘研究[J].系统工程与电子技术,2006,28(2):301-305. 被引量：1
2单保录,宋威.一种基于确定性方法的关联规则挖掘算法[J].计算机时代,2009(5):42-43.
3宋威,李晋宏,徐章艳,杨炳儒.一种新的频繁项集精简表示方法及其挖掘算法的研究[J].计算机研究与发展,2010,47(2):277-285. 被引量：18
4杜垒,王俊京.基于FP增长算法的数据挖掘技术[J].技术与市场,2011,18(3):69-70. 被引量：2
5郑晓艳,孙济洲.稀疏数据源频繁模式挖掘并行算法[J].天津大学学报,2011,44(4):353-358.
6杜垒,王俊京.最大频繁项集剪枝策略[J].内江科技,2011,32(5):69-69.
7杜垒,王俊京.一种新的最大频繁项集挖掘算法[J].科技信息,2011(14).
8李秦,张馨东,童甲佳,李宇博.基于频繁模式表的关联分类器构建算法研究[J].计算机应用与软件,2011,28(6):39-42.
9杜垒.改进超集检测策略[J].技术与市场,2011,18(6):27-28.
10张吉武.网络课程平台数据库关联规则挖掘算法研究[J].科技信息,2011(31):255-256.

同被引文献19

1HERNANDEZ-LEON R, PALANCAR J H, CARRASCO-OCHOA J A, et al. Algorithms for mining frequent itemsets in static and dynamic datasets [ J ]. Intelligent Data Analysis, 2010, 14(3) :419-435.
2HAN J, KAMBER M. Data mining: concepts and techniques[M]. 2nd ed. San Francisco, CA, USA: Morgan Kaufmann Publisher, 2006.
3PIATETSKY-SHAPIRO G. Data mining and knowledge discovery 1996 to 2005 : overcoming the hype and moving from "university" to "business" and "analytics" [ J ]. Data Mining Knowledge Discovery, 2007, 15 ( 1 ) : 99- 105.
4CHIANG D A, WANG Y F, WANG Y H, et al. Mining disjunctive consequent association rules [J]. Applied Soft Computing, 2011, 11(2): 2129-2133.
5AGRAWAL R, IMIELINSKI T, SWAMI A. Mining associations between sets of items in massive databases[C]//Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. Washington D C, USA: ACM Press, 1993. 207-216.
6AGRAWAL R, SRIKANT R. Fast algorithms for mining association rules in large databases [ C ]//Proceedings of the 20th International Conference on Very Large Data Bases. Santiago de Chile, Chile: Morgan Kaufmann Publisher, 1994 : 487-499.
7SONG W, YANG B R, XU Z Y. Index-BitTableFI: an improved algorithm for mining frequent itemsets [ J ]. Knowledge-Based Systems, 2008, 21 (6): 507-513.
8VREEKEN J, LEEUWEN M, SIEBES A. Krimp: mining itemsets that compress [J].Data Mining Knowledge Discovery, 2011, 23 ( 1 ) : 169-214.
9HAN J, PEI J, YIN Y, et al. Mining frequent patterns without candidate generation: a frequent-pattern tree approach [ J ]. Data Mining and Knowledge Discovery, 2004, 8 ( 1 ) : 53-87.
10LIU G, LU H, LOU W, et al. Efficient mining of frequent patterns using ascending frequency ordered prefix- tree [ J ]. Data Mining and Knowledge Discovery, 2004, 9 (3) : 249-274.

引证文献1

1宋威,刘文博,李晋宏.基于动态裁剪频繁模式树的频繁项集并发挖掘算法[J].山东大学学报（工学版）,2011,41(4):49-55. 被引量：3

二级引证文献3

1周兴华,陆建峰,汤九斌.基于多线程技术的数据流频繁模式挖掘[J].计算机应用,2013,33(A01):69-72.
2江雨燕,李平.基于PFP-Growth算法的海量频繁项集挖掘[J].计算机技术与发展,2013,23(9):63-65. 被引量：2
3罗芳.一种基于裁剪FP-Tree的频繁项集挖掘算法[J].宜春学院学报,2015,37(12):22-25. 被引量：1

1李立波,白树仁,陈磊,张威.基于不确定数据的可能频繁闭序列模式挖掘[J].计算机应用研究,2016,33(4):983-988. 被引量：7
2宋威,高磊,李晋宏.一种基于闭项集的无冗余关联规则挖掘方法[J].北京交通大学学报,2009,33(6):91-96. 被引量：1
3李庆华,马传香.挖掘频繁闭序列的一种改进算法[J].小型微型计算机系统,2006,27(3):489-491.
4白似雪,朱天.InClosPan:大型数据库中闭序列模式的增量挖掘[J].南昌大学学报（理科版）,2008,32(1):96-99. 被引量：3
5宋威,杨炳儒,徐章艳,高静.一种改进的频繁闭项集挖掘算法[J].计算机研究与发展,2008,45(2):278-286. 被引量：11
6曾致中.对于基于最长频繁闭项集的聚类算法的探讨[J].农业网络信息,2007(6):60-60.
7张泽洪,张伟.基于最长频繁闭项集的聚类算法[J].计算机工程,2007,33(1):187-189. 被引量：2
8胡蓉,陈文.一种基于串与运算的关联规则挖掘算法[J].东北电力学院学报,2005,25(2):12-15.
9王艳,李玲玲,邵晓艳.改进的频繁项集挖掘算法研究[J].计算机工程与应用,2012,48(19):119-121. 被引量：2
10王燕.基于等价关系的关联规则挖掘算法研究[J].计算机工程与应用,2006,42(8):187-189. 被引量：5

系统工程与电子技术

2009年第10期

浏览历史

内容加载中请稍等...

基于包含索引的频繁闭序列模式挖掘的新算法被引量：1

参考文献14

二级参考文献26

共引文献12

同被引文献19

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于包含索引的频繁闭序列模式挖掘的新算法 被引量：1

参考文献14

二级参考文献26

共引文献12

同被引文献19

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于包含索引的频繁闭序列模式挖掘的新算法被引量：1