数据流最大频繁项挖掘方法被引量：2

Mining Method of Data Stream Maximum Frequent Itemsets

下载PDF

导出

摘要提出基于事务矩阵挖掘最大频繁项集的方法AFMI,该方法采取迭代精简事务矩阵的方式求解所有事务中的最大频繁项集,从精简后的事务向量交集的子集中搜索最大频繁项集,并运用逻辑运算和剪枝方法提高挖掘效率。基于AFMI方法,研究挖掘滑动窗口数据流最大频繁项集算法AFMI+,该算法可使用户周期性地挖掘当前窗口中的最大频繁项集。实验结果表明,AFMI和AFMI+算法均具有较好的性能。 A method called AFMI based on a transaction matrix is proposed to mine the maximum frequent itemsets. The frequent itemsets are obtained from all the transactions by means of condensing iteratively the transaction matrix, the transaction vector intersections are acquired to reduce the range of search. Logical operations and pruning methods are adopted to improve the efficiency of the mining. Based on AFMI, an algorithm called AFMI＋ is proposed, which can mine maximum frequent itemsets from a sliding window over data streams. AFMI＋ can get the maximum frequent itemsets in current sliding window over data streams just when users need to get them periodically. Experimental result shows that AFMI and AFMI＋ algorithms have better performance.

作者张月琴陈东

机构地区南京工业大学电子与信息工程学院

出处《计算机工程》 CAS CSCD 北大核心 2010年第22期86-87,90,共3页 Computer Engineering

基金南京工业大学青年教师学术基金资助项目(39709013)

关键词数据挖掘数据流滑动窗口最大频繁项集矩阵 data mining data stream sliding window maximum frequent itemsets matrix

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献6

1Grahne G, Zhu J E High Performance Mining of Maximal Frequent Itemsets[C]//Proc. of the 6th SIAM Int'l Workshop on High Performance Data Mining. San Francisco, USA: [s. n.], 2003: 135-143.
2张忠平,郑为夷.基于事务树的最大频繁项集挖掘算法[J].计算机工程,2009,35(15):97-99. 被引量：7
3陈波,王乐,董鹏.挖掘最大频繁项集的事务集迭代算法[J].计算机工程与应用,2009,45(6):141-144. 被引量：3
4Li H, Lee S, Shan M. Online Mining(Recently) Maximal Frequent Itemsets over Data Streams[C]//Proc. of the 15th International Workshops on Research Issues in Data Engineering: Stream Data Mining and Applications. Tokyo, Japan: [s. n.], 2005: 11-18.
5Lee D, Lee W. Finding Maximal Frequent Itemsets over Online Data Streams Adaptively[C]//Proc. of the 5th IEEE International Conference on Data Mining. Houston, USA: IEEE Press, 2005: 266-273.
6敖富江,颜跃进,刘宝宏,黄柯棣.在线挖掘数据流滑动窗口中最大频繁项集[J].系统仿真学报,2009,21(4):1134-1139. 被引量：9

二级参考文献27

1李庆华,王卉,蒋盛益.挖掘最大频繁项集的并行算法[J].计算机科学,2004,31(12):132-134. 被引量：5
2胡斌,蒋外文,蔡国民,黄天强,卓月明.基于位阵的更新最大频繁项集算法[J].计算机工程,2007,33(3):59-61. 被引量：4
3Ceglar A,Roddick J F.Association mining[J].ACM Computing Surveys, 2006,38(2) : 1-42.
4Rigoutsos L,Floratos A.Combinatoriat pattern discovery in bio-logical sequences:the teiresias algorithm[J].Bioinformaties, 1998,14( 1 ) : 55-67.
5Bayardo R J.Efficiently mining long patterns from databases[C]// Haas L M,Tiwary A.Proceedings ACM SIGMOD International Conference on Management of Data, 1998:85-93.
6Lin D I,Kedem Z M.Pincer-search:a new algorithm for discovering the maximum frequent set[C]//Schek H J.Proceedings of 6th International Conference on Extending Database Technology,1998: 105-119.
7Agarwal R C,Aggarwal C C,Prasad V V V.Depth first generation of long patterns[C]//Ramakrishnan R,Stolfo S.Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000:108-118.
8Burdick D,Calimlim M,Gehrke J.MAFIA:a maximal frequent itemset 'algorithm for transactional databases[C]//Georgakopoulos D. Proceedings of the 17th International Conference on Data Engineering, 2001 : 443-452.
9Gouda K,Zaki M J.Efficiently mining maximal frequent itemsets[C]// Cercone N,Lin T Y,Wu X D.Proceedings of the 2001 IEEE International Conference on Data Mining,2001:163-170.
10B Babcock, S Babu, M Datar, R Motwani, J Widom. Models and Issues in Data Stream Systems [C]// Proc. of PODS'2002. USA: ACM, 2002: 1-16.

共引文献16

1张笑达,徐立臻.一种改进的基于矩阵的频繁项集挖掘算法[J].计算机技术与发展,2010,20(4):93-96. 被引量：8
2詹志飞.基于圈和树的频繁项集挖掘算法[J].电脑知识与技术,2010,6(5):3502-3504.
3倪志伟,姜苗,王超,戴奇波.在线挖掘数据流混合窗口中闭频繁项集[J].系统仿真学报,2010,22(9):2110-2114. 被引量：2
4琚春华,许翀寰.基于有序复合策略的数据流最大频繁项集挖掘[J].情报学报,2010,29(5):864-871.
5张月琴.数据挖掘在多Agent入侵检测系统中的应用[J].计算机应用与软件,2010,27(11):284-286. 被引量：1
6神鹏飞,王希武,耿志广,王创伟,李国良.一种无阈值的频繁模式生成算法[J].计算机工程,2011,37(8):31-33.
7姜苗,倪志伟,孟金华,周之强.数据流时间窗口中闭频繁项集的在线挖掘[J].中国科学技术大学学报,2011,41(8):739-745. 被引量：1
8曾志勇,杨辉,余建坤.基于HMT和哈希树的Apriori并行算法研究[J].计算机工程与设计,2012,33(1):214-218. 被引量：3
9姚全珠,李如琼,王美君.项约束先过滤的最大频繁项集挖掘算法[J].计算机工程,2012,38(4):73-75. 被引量：8
10曹红,郑鑫.数据流分类器算法在水质环境中的应用[J].科技通报,2014,30(1):117-122.

同被引文献24

1吉根林,杨明,宋余庆,孙志挥.最大频繁项目集的快速更新[J].计算机学报,2005,28(1):128-135. 被引量：47
2颜跃进,李舟军,陈火旺.一种挖掘最大频繁项集的深度优先算法[J].计算机研究与发展,2005,42(3):462-467. 被引量：20
3潘云鹤,王金龙,徐从富.数据流频繁模式挖掘研究进展[J].自动化学报,2006,32(4):594-602. 被引量：34
4韩家炜.数据挖掘概念与技术[M].北京:机械工业出版社,2012.
5Manku G S, Motwani R.Approximate frequency counts over data streams[C]//Proceeding of the 28th International Conference on VLDB,Hong Kong,2002.
6Giannella C, Han J, Pei J.Mining frequent patterns in data streams at multiple time granularities[C]//Proceeding of the NSF Workshop on Next Generation Data Mining, 2002: 191-212.
7Cheng J, Ke Y, Ng W.Maintaining frequent itemsets over high-speed data streams[C]//Proceeding of the 10th PAKDD, 2006.
8Lee Daesu, Lee Wonsuk.Finding maximal frequent itemsets over online data streams adaptively[C]//Proc of Fifth IEEE International Conference on Data Mining.Washington DC : IEEE Computer Society, 2005: 266-273.
9Mao Guojun, Wu Xindong, Zhu Xingquan.Mining maximal frequent itemsets from data streams[J].Joumal of Infor- mation Science, 2007,33 ( 3 ) : 251-262.
10Li Hua-Fu,Lee Suh-Yin, Shan Man-Kwan.Online mining (recently)maximal frequent itemset over data streams[C]// Proc of the 15th International Workshops on Research Issues in Data Engineering: Stream Data Mining and Application, 2005 : 11-18.

引证文献2

1胡健,吴毛毛.一种改进的数据流最大频繁项集挖掘算法[J].计算机工程与科学,2014,36(5):963-970. 被引量：4
2尹绍宏,单坤玉,范桂丹.滑动窗口中数据流最大频繁项集挖掘算法研究[J].计算机工程与应用,2015,51(22):145-149. 被引量：7

二级引证文献10

1杜晓明,代逸生.基于分辨矩阵和Apriori算法的关联规则挖掘研究与应用[J].中国科技论文,2015,10(20):2369-2372. 被引量：3
2郑斌.空间数据库中有效数据频繁项检测仿真研究[J].计算机仿真,2017,34(4):444-447. 被引量：3
3王红梅,李芬田,王泽儒.基于滑动窗口数据流频繁项集挖掘模型综述[J].长春工业大学学报,2017,38(5):484-490. 被引量：4
4朱颢东,薛校博,李红婵,孟颍辉.海量数据下基于Hadoop的分布式FP-Growth算法[J].轻工学报,2018,33(5):97-102. 被引量：4
5韩崇,袁颖珊,梅焘,耿慧玲.基于K-means的数据流离群点检测算法[J].计算机工程与应用,2017,53(3):58-63. 被引量：12
6尚晓丽,包向辉.分布式空间数据库中有效数据频繁项实时检测[J].科学技术与工程,2018,18(19):224-229. 被引量：4
7文凯,耿小海,许萌萌.基于BTA算法的数据流频繁项集挖掘[J].计算机工程与设计,2020,41(8):2226-2230. 被引量：5
8徐清妍,何丽,朱泓西.改进Hoeffding不等式的概念漂移检测方法[J].计算机工程与应用,2020,56(19):55-61. 被引量：4
9文凯,耿小海,朱璐伟,许萌萌.基于AO算法的数据流频繁项集挖掘[J].计算机工程与科学,2020,42(12):2259-2264. 被引量：5
10陈向华,刘可昂.基于FP-Tree的最大频繁项目集挖掘算法[J].软件,2015,36(12):98-102. 被引量：6

1冯贺,陶宏才.基于事务矩阵的关联规则挖掘算法[J].电脑学习,2008(5):46-47.
2边根庆,王月.一种基于矩阵和权重改进的Apriori算法[J].微电子学与计算机,2017,34(1):136-140. 被引量：23
3章芬芬.关联规则挖掘在个性化学习系统设计中的应用[J].韶关学院学报,2005,26(9):36-40. 被引量：5
4张锋,常会友.使用BP神经网络缓解协同过滤推荐算法的稀疏性问题[J].计算机研究与发展,2006,43(4):667-672. 被引量：85
5李贵,韩子扬,郑新录,李征宇.基于Apriori算法的Deep Web网页关系挖掘研究[J].山东大学学报（理学版）,2011,46(5):67-70.
6李红.一个改进的关联规则挖掘算法[J].电脑知识与技术,2006(11):19-19.
7梁宝华,罗振菊,徐英虎.一种高效的最大频繁项挖掘算法[J].信息化纵横,2009(11):67-69.
8任俊超,胡刚.线性系统容错控制器的设计-BMI方法[J].广东工业大学学报,2002,19(4):16-19.
9刘健,张维明.基于互信息的文本特征选择方法研究与改进[J].计算机工程与应用,2008,44(10):135-137. 被引量：23
10刘卫明,蒯海龙,陈志刚,毛伊敏.基于有序树的不确定数据最大频繁项挖掘算法[J].计算机工程与应用,2015,51(24):145-149. 被引量：7

计算机工程

2010年第22期

浏览历史

内容加载中请稍等...

数据流最大频繁项挖掘方法被引量：2

参考文献6

二级参考文献27

共引文献16

同被引文献24

引证文献2

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

数据流最大频繁项挖掘方法 被引量：2

参考文献6

二级参考文献27

共引文献16

同被引文献24

引证文献2

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

数据流最大频繁项挖掘方法被引量：2