基于兴趣度的数据流频繁模式散列挖掘算法被引量：4

Mining approximate frequency itemsets over data streams based on hash and interesting degree

导出

摘要频繁模式挖掘是很多数据流挖掘工作的基础.现有算法虽然能够有效的在数据流中挖掘近似的频繁模式,但是由于数据流数据的不确定性、连续性以及海量性,始终不能有效的将算法的时间效率和空间效率控制在一个可以接受的范围内.本文通过使用散列表作为概要数据的存储结构,并引入关联规则兴趣度的概念,提出了数据流频繁模式挖掘算法MIFS-HT(mining interesting frequent itemsets with hash table),不仅有效降低现有算法的时空复杂度,同时提高了算法的应用价值.最后,实验结果表明:MIFS-HT是一种高效的数据流频繁模式挖掘算法,其性能优于FPStream、LossyCounting等算法,并且挖掘结果更具有现实意义. Frequent itemsets mining, which is the basic in the field of data stream mining, has been paid more and more attention by researchers. Due to the uncertainties, continuities and large amount of data streams, many mining algorithms are difficult to deal with these dynamic data streams. In this paper, hashed table and the interesting degree of association rules are introduced, where the former is used to represent the synoptic data structure and the latter is applied to incorporate attention of customers. After that, a new frequent itemsets mining algorithm named MIFS-HT（mining interesting frequent itemsets with hash table） is proposed. Comparing with lossy counting and a similar algorithm called mining frequent item sets over data streams by matrix （MISM for short）, the result shows that MIFS-HT is more effective both in time and space efficiency.

作者琚春华殷贤君

机构地区浙江工商大学计算机与信息工程学院浙江工商大学现代商贸中心

出处《系统工程理论与实践》 EI CSSCI CSCD 北大核心 2012年第12期2764-2773,共10页 Systems Engineering-Theory & Practice

基金国家自然科学基金(71071141) 高等学校博士学科点专项科研基金(20103326110001) 浙江省自然科学基金重点项目(Z1091224) 浙江工商大学现代商贸中心(11JDSM02Z)

关键词数据流频繁模式兴趣度 MIFS—HT data stream frequent itemset degree of interesting MIFS-HT

分类号 U491.17 [交通运输工程—交通运输规划与管理]

引文网络
相关文献

参考文献5

1铁治欣,陈奇,俞瑞钊.采掘关联规则的高效并行算法[J].计算机研究与发展,1999,36(8):948-953. 被引量：37
2程继华,郭建生,施鹏飞.挖掘所关注规则的多策略方法研究[J].计算机学报,2000,23(1):47-51. 被引量：22
3王磊,黄志球,朱小栋,沈国华,程亮.数据流中基于矩阵的频繁项集挖掘[J].计算机科学与探索,2008,2(3):330-336. 被引量：6
4周皓峰,朱扬勇,施伯乐.一个基于兴趣度的关联规则采掘算法[J].计算机研究与发展,2002,39(4):450-457. 被引量：50
5蔡伟杰,张晓辉,朱建秋,朱扬勇.关联规则挖掘综述[J].计算机工程,2001,27(5):31-33. 被引量：134

二级参考文献28

1铁治欣.数据采掘技术，浙江大学博士生讨论班报告[M].,1998..
2左万利刘居正.包含正负属性的关联规则及其挖掘.第十六届全国数据库学术会议论文集[M].兰州,1999.288-292.
31，Agrawal R, Mannila H, Srikant R et al. Fast discovery of association rules. In: Fayyad M, Piatetsky-Shapiro G, Smyth P eds. Advances in Knowledge Discovery and Data Mining. Menlo Park, California: AAAI/MIT Press, 1996. 307-328
42，Brin S, Motwani R, Ullman J D et al. Dynamic itemset counting and implication rules for market basket data. In: Proc the ACM SIGMOD International Conference on Management of Data, Tucson, Arizon, 1997. 255-264
53，Fayyad U M, Piatesky-shapiro G, Smyth P P. From data mining to knowledge discovery: an overview. In: Fayyad M, Piatetsky-Shapiro G, Smyth P eds. Advances in Knowledge Discovery and Data Mining. California:AAAI Press, 1996. 1-36
64，Piatesket-Shapiro G. Discovery, analysis, and presentation of strong rules. In: Piatesky-Shapiro G, Frawley W J eds. Advances in Knowledge Discovery and Data Mining. Menlo Park, California:AAAI/MIT Press, 1991. 229-238
75，Silberschatz A, Stonebraker M, Ullman J. What makes patterns interesting in knowledge discovery sysstems. IEEE Trans on Knowledge and Data Engineering, 1996, 8(6):970-974
86，Symth P, Goodman R M. An information theoretic approach to rule induction from databases. IEEE Trans on Knowledge and Data Engineering, 1992, 4(4):301-316
97，Toivonen H, Klemettinen M, Ronkainen P et al. Pruning and grouping discovered association rules. In: Mlnet Workshop on Statistics, Machine Learning, and Discovery in Database, Gete, Greece, 1995. 47-52
10铁治欣，浙江大学博士生讨论班报告，1998年

共引文献241

1罗航,余利娟,张康.移动端考研产品的春天真的到来了吗?[J].广东经济,2017,0(7X):157-157.
2马峻,曾建潮.一种基于Rough理论的知识推理冲突消解策略[J].数学的实践与认识,2007,37(8):66-72. 被引量：1
3林景亮,董槐林,姜青山,吴书.一种基于新增阈值的频繁模式挖掘算法[J].计算机研究与发展,2006,43(z3):366-370. 被引量：1
4韩奎国,龚卫国,李伟红,马任飞,史澜.基于CRM的大型商场POS-MIS系统的设计开发[J].仪器仪表学报,2005,26(z2):337-340.
5刘洪婧,邓芬.关联规则Apriori算法的一种优化与实现[J].计算机时代,2009(3):62-64. 被引量：2
6李霞,王秋云,董健康.关联规则挖掘算法[J].科技经济市场,2006(12):285-286.
7张惠民,王晓卫,肖庆.数据库中关联规则的并行/分布式采掘技术[J].装甲兵工程学院学报,2003,17(2):38-41. 被引量：1
8潘雷.优化关联规则算法的方法研究[J].南京晓庄学院学报,2005,21(5):71-76. 被引量：1
9马猛,唐理兵,李学俊.基于OLAP的关联规则的挖掘[J].宿州学院学报,2004,19(5):77-78.
10李晓林,王建华,廖作文.一种改进的Apriori算法[J].软件导刊,2010,9(1):55-57. 被引量：5

同被引文献28

1王崇,刘健.网络消费者购买服装类商品的决策心理研究[J].大连理工大学学报（社会科学版）,2011,32(1):14-18. 被引量：4
2耿素云.集合论与图论[M].北京:北京大学出版社,1997.
3Han J, Kamber M. Data mining: Concepts and techniques[M]. San Mateo: Morgan Kaufmann, 2000.
4Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases[C]// Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, New York, USA, 1993, 22(2): 207-216.
5Brin S, Motwani R, Silverstein C. Beyond market baskets: Generalizing association rules to correlations[C]// Proceedings of the 1997 CM SIGMOD International Conference on Management of Data, New York, USA, 1997, 26(2): 265-276.
6Tsur D, Ullman J D, Abiteboul S, et al. Query flocks: A generalization of association-rule mining[C]// Pro- ceedings of the 1998 ACM SIGMOD International Conference on Management of Data, New York, USA, 1998, 27(2): 1 -12.
7Agrawal R, Srikant R. Fast algorithms for mining association rules[C]// Proceedings of the 20th International Conference on Very Large Data Bases, San Francisco, USA, 1994: 487-499.
8Park 3 S, Chen M S, Yu P S. An effective hash based algorithm for mining association rules[C]// Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, New York, USA, 1995, 24(2): 175-186.
9Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation[C]// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, New York, USA, 2000, 29(2): 1 -121.
10Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation: A frequent pattern tree approach[J]. Data Mining and Knowledge Discovery, 2004, 8(1): 53-87.

引证文献4

1杨丰梅,李梦,田歆,李健,丁玉章.一种带记忆性的零售商品关联度分析方法[J].系统工程理论与实践,2014,34(11):2872-2880. 被引量：11
2李桃迎,张鑫,陈燕.基于艾宾浩斯遗忘曲线的零售商品模糊关联分析[J].计算机应用研究,2018,35(2):462-465. 被引量：5
3李桃迎,李峰,陈燕,吕晓宁.零售商品关联大数据稀疏网络的快速聚类算法[J].控制与决策,2018,33(6):1117-1122. 被引量：2
4李桃迎,陈燕,张金松.基于复数权网络的零售商品关联分析方法[J].计算机应用研究,2018,35(10):2936-2939. 被引量：1

二级引证文献19

1常凤,刘静,包浕,冯婷,胡忠旭.关联规则在乡村超市销售中的应用研究[J].昭通学院学报,2023,45(5):8-12. 被引量：1
2卢喜利,周月鹏.基于全文索引技术的超市商品的关联分析[J].微型电脑应用,2015,31(6):59-60.
3崔春生.基于Vague集理论的推荐系统中用户兴趣度的描述[J].系统工程理论与实践,2017,37(3):752-760. 被引量：14
4田歆,汪寿阳,鄂尔江,丁玉章.零售大数据与商业智能系统的设计、实现与应用[J].系统工程理论与实践,2017,37(5):1282-1293. 被引量：25
5李桃迎,张鑫,陈燕.基于艾宾浩斯遗忘曲线的零售商品模糊关联分析[J].计算机应用研究,2018,35(2):462-465. 被引量：5
6何跃,王爱欣,丰月,王莉.基于关联规则的门诊药房布局优化[J].数据分析与知识发现,2018,2(1):99-108. 被引量：1
7李桃迎,李峰,陈燕,吕晓宁.零售商品关联大数据稀疏网络的快速聚类算法[J].控制与决策,2018,33(6):1117-1122. 被引量：2
8吴彦文,刘闯.基于用户偏好和可疑度的推荐方法研究[J].计算机应用研究,2018,35(12):3632-3634. 被引量：3
9李桃迎,陈燕,张金松.基于复数权网络的零售商品关联分析方法[J].计算机应用研究,2018,35(10):2936-2939. 被引量：1
10黄黎明,刘振宇.用改进Apriori算法确定药房处方药物的关联规则[J].电子设计工程,2018,26(24):36-40. 被引量：7

1博世市场调查：驾驶员辅助系统在中国广受欢迎[J].汽车制造业,2016,0(3):4-4.
2袁帅(翻译).钦奈地铁项目隧道部分即将开动[J].交通工程建设,2013(1):3-3.
3本田推出二代Stream[J].汽车与配件,2006(33):13-13.
4彭清涛,李连营,徐志强,张德鑫.VC环境下基于mif格式的数字地图开发[J].黑龙江交通科技,2010,33(3):131-133. 被引量：1
5晓兵.在城市下挖洞的家伙[J].中国科技纵横,2004(8):147-155.
6HONDA推出新款跑车施行车“Stream”[J].汽车时代,2001(7):11-11.
7苏州自产盾构掘进机五项知识产权世界首创[J].机械工程师,2012(1):3-3.
8潘国强.国内铁路开行双层集装箱列车的必要性和可行性[J].铁道勘测与设计,2003(6):4-9. 被引量：1
9谢翔.我不是7座雅阁：广汽本田奥德赛[J].汽车杂志,2014(9):110-113.
10王静毅.威伯科荣获“第十届全国百家优秀汽车零部件供应商”之优秀跨国零部件供应商奖[J].交通世界,2013(24):22-22.

系统工程理论与实践

2012年第12期

浏览历史

内容加载中请稍等...

基于兴趣度的数据流频繁模式散列挖掘算法被引量：4

参考文献5

二级参考文献28

共引文献241

同被引文献28

引证文献4

二级引证文献19

相关作者

相关机构

相关主题

浏览历史

基于兴趣度的数据流频繁模式散列挖掘算法 被引量：4

参考文献5

二级参考文献28

共引文献241

同被引文献28

引证文献4

二级引证文献19

相关作者

相关机构

相关主题

浏览历史

基于兴趣度的数据流频繁模式散列挖掘算法被引量：4