基于滑动窗口的数据流高效用模糊项集挖掘

High Utility Fuzzy Itemsets Mining Over Data Stream Based on Sliding Window Model

下载PDF

导出

摘要高效用项集挖掘可以提供有趣的结果集,但并不能提供单个项的数量,因此,本文提出了高效用模糊项集.但是,现实世界的数据是不断出现的,需要实时处理新到来的数据.为解决当前高效用模糊项集不能处理数据流的问题,又提出了模糊效用列表(fuzzy utility list,FUL)结构用于存储当前窗口中项的批次号、项在事务中的事务标识符、项的模糊效用以及项的剩余模糊效用,该结构能有效的对批次进行插入和删除操作.最后,基于FUL提出了数据流高效用模糊项集挖掘算法.对真实数据集和合成数据集进行了广泛的实验,结果证实了算法的效率及可行性. High-utility itemsets mining(HUI)can provide interesting itemsets,but cannot provide information on the number of items.Therefore,high utility fuzzy itemsets are proposed.However,real-world data is constantly emerging.Thus,new incoming data needs to be processed in real time.To solve the problem that the current high utility fuzzy itemsets cannot handle the data stream,a fuzzy utility list(FUL)structure is proposed to store the information of items,including batch number of items,the transaction identifier of the items,the fuzzy utility of items,and the reminding fuzzy utility of items.FUL can effectively insert and delete batches.Finally,based on FUL,a high utility fuzzy itemset mining algorithm on data stream is proposed,extensive experiments on real and synthetic datasets show the efficiency and feasibility of the algorithm.

作者单芝慧韩萌韩强 Shan Zhihui;Han Meng;Han Qiang(School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China;The Key Laboratory of Images&Graphics Intelligent Processing of State Ethnic Affairs Commission,North Minzu University,Yinchuan 750021,China)

机构地区北方民族大学计算机科学与工程学院北方民族大学图像图形智能处理国家民委重点实验室

出处《南京师大学报（自然科学版）》 CAS 北大核心 2023年第1期120-129,共10页 Journal of Nanjing Normal University(Natural Science Edition)

基金国家自然科学基金项目(62062004、61862001) 宁夏自然科学基金项目(2020AAC03216)。

关键词数据流挖掘滑动窗口高效用项集挖掘模糊效用效用列表 data stream mining sliding window high utility itemsets mining fuzzy utility utility list

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献6

1Thu-Lan DAM,Kenli LI,Philippe FOURNIER-VIGER,Quang-Huy DUONG.CLS-Miner: efficient and effective closed high-utility itemset mining[J].Frontiers of Computer Science,2019,13(2):357-381. 被引量：10
2杨皓,段磊,胡斌,邓松,王文韬,秦攀.带间隔约束的Top-k对比序列模式挖掘[J].软件学报,2015,26(11):2994-3009. 被引量：22
3王晓璇,王丽珍,陈红梅,方圆,杨培忠.基于特征效用参与率的空间高效用co-location模式挖掘方法[J].计算机学报,2019,42(8):1721-1738. 被引量：12
4吉根林,王敏.时空轨迹聚集模式挖掘研究进展[J].南京师大学报（自然科学版）,2015,38(4):1-7. 被引量：4
5宋威,刘明渊,李晋宏.基于事务型滑动窗口的数据流中高效用项集挖掘算法[J].南京大学学报（自然科学版）,2014,50(4):494-504. 被引量：4
6程浩东,韩萌,张妮,李小娟,王乐.基于滑动窗口模型的数据流闭合高效用项集挖掘[J].计算机研究与发展,2021,58(11):2500-2514. 被引量：12

二级参考文献68

1Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In: Bocca J B, Jarke M, Zaniolo C. The 20th International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann, 1994: 487-499.
2Liu Y, Liao W K, Choudhary A N. A two-phase algorithm for fast discovery of high utility itemsets. In: Ho T B, Cheung D W L, Liu H. The 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2005: 689-695.
3Lan G C, Hong T P, Tseng V S. An efficient projection-based indexing approach for mining high utility itemsets. Knowledge and Information Systems, 2014, 38(1): 85-107.
4Yao H, Hamilton H J, Butz C J. A foundational approach to mining itemset utilities from databases. In: Berry M W, Dayal U, Kamath C, et al. The 4th SIAM International Conference on Data Mining. Philadelphia: SIAM Press, 2004, 482-486.
5Yao H, Hamilton H J. Mining itemset utilities from transaction databases. Data & Knowledge Engineering, 2006, 59(3): 603-626.
6Li Y C, Yeh J S, Chang C C. Isolated items discarding strategy for discovering high utility itemsets. Data & Knowledge Engineering, 2008, 64(1): 198-217.
7Han J, Pei J, Yin Y, et al. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004, 8(1): 53-87.
8Tseng V S, Wu CW, Shie BE, et al. UP-growth: An efficient algorithm for high utility itemset mining. In: Rao B, Krishnapuram B, Tomkins A, et al. The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2010, 253-262.
9Tseng V S, Shie B E, Wu C W, et al. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8): 1772-1786.
10Ahmed C F, Tanbeer S K, Jeong B-S, et al. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(12): 1708-1721.

共引文献56

1邱萍,董祥军.正负序列模式中的约束条件研究[J].齐鲁工业大学学报,2016,30(5):39-45.
2Youxi WU,Cong SHEN,He JIANG,Xindong WU.Strict pattern matching under non-overlapping condition[J].Science China(Information Sciences),2017,60(1):1-16. 被引量：4
3魏芹双.对比模式挖掘研究进展[J].网络安全技术与应用,2017(1):44-44. 被引量：1
4张海清,李代伟,刘胤田,龚程,于曦.最大模糊频繁模式挖掘算法[J].计算机应用,2017,37(5):1424-1429. 被引量：1
5陈湘涛,肖碧文.基于位置信息的显露序列模式挖掘研究[J].计算机科学,2017,44(7):175-179.
6张鹏,段磊,秦攀,左劼,唐常杰,元昌安,彭舰.基于Spark的Top-k对比序列模式挖掘[J].计算机研究与发展,2017,54(7):1452-1464. 被引量：7
7茹蓓,贺新征.减少候选项集的数据流高效用项集挖掘算法[J].计算机应用研究,2017,34(11):3379-3383. 被引量：3
8胡法奎,陈高云,龚程,张海清.面向大规模医疗数据的模糊频繁模式挖掘研究[J].信息通信,2017,30(3):14-16. 被引量：2
9王红梅,李芬田,王泽儒.基于滑动窗口数据流频繁项集挖掘模型综述[J].长春工业大学学报,2017,38(5):484-490. 被引量：4
10李安亚,王少妮.对比模式挖掘研究进展[J].科研信息化技术与应用,2017,8(5):66-73. 被引量：1

1张匡燕,刘三民,李京阳.基于双层采样的主动式数据流挖掘方法[J].天津理工大学学报,2022,38(6):52-57.
2姬文鹏,贺骥,常勇,张夷斋,陈斌.基于轻量化卷积神经网络的桥梁混凝土裂缝检测方法及验证[J].公路交通技术,2023,39(1):125-132. 被引量：3
3黄东海.基于三层架构的网络信息平台数据库建设研究[J].经纬天地,2022(6):48-52. 被引量：1
4段顺然,尹美娟,刘粉林,焦隆隆,于岚岚.一种基于影响力预测的节点排序模型[J].计算机科学,2023,50(3):155-163. 被引量：1
5杨爱萍,张腾飞,王朝臣,邵明福,周雅然,丁学文.基于方向引导的残差去雨网络[J].天津大学学报（自然科学与工程技术版）,2023,56(4):391-399.
6刘贤松,屠梓浩,高有利.基于ERNIE序列标注的地址分级模型应用[J].邮电设计技术,2023(2):89-92. 被引量：1
7邓亚萍,王新,尹甜甜.基于多层交叉注意力融合网络模型的人脸图像情感分析[J].科学技术与工程,2023,23(3):1152-1159.
8陈真,普园媛,赵征鹏,徐丹,钱文华.基于自适应门控信息融合的多模态情感分析[J].计算机科学,2023,50(3):298-306. 被引量：1
9范莲静,芦俊丽,段鹏,昌鑫,陈书健.基于 T(X )参与度的负co-location模式挖掘算法[J].云南民族大学学报（自然科学版）,2023,32(1):59-68. 被引量：1
10甘萍,农丽萍,张文辉,林基明,王俊义.一种用于交通预测的注意力时空图神经网络[J].西安电子科技大学学报,2023,50(1):168-176. 被引量：2

南京师大学报（自然科学版）

2023年第1期

浏览历史

内容加载中请稍等...

基于滑动窗口的数据流高效用模糊项集挖掘

参考文献6

二级参考文献68

共引文献56

相关作者

相关机构

相关主题

浏览历史