一种共生保序模式挖掘算法

Co-occurrence Order-preserving Pattern Mining

下载PDF

导出

摘要作为数据挖掘的一个新兴方向,研究人员在时间序列领域提出了用于挖掘相对次序相同的保序模式.尽管现有的保序模式挖掘算法可以有效地找出全部的频繁模式,但在当用户仅对某个特定的模式及其为前缀的模式较为感兴趣时,现有的挖掘算法效率过于低下.为了解决上述问题,本文提出了一种共生保序模式挖掘算法,用于挖掘出以给定模式为前缀的共生保序模式.该算法包括融合准备和计算超模式的支持度两个主要部分,其中,融合准备分为4个步骤:获取模式p的后缀保序模式,计算后缀保序模式的出现,前向验证模式p的出现,后向查找所有可融合模式的出现;在计算超模式的支持度时,提出一种剪枝策略,使得候选模式的个数进一步减少.在真实数据集上,实验结果验证了本文算法的高效性. As an emerging direction of data mining,researchers presented order-preserving pattern(OPP)in the field of time series,which is used to mine the same relative order.Although existing OPP mining algorithms can effectively find all frequent patterns,they are inefficient when users are only interested in a specific pattern and patterns with the specific pattern as prefix.To solve the above problems,this paper proposes a co-occurrence order-preserving pattern(COOP)mining algorithm,which is used to mine the co-occurrence order-preserving pattern with a given pattern as its prefix.The COOP algorithm consists of two main parts:fusion preparation and calculation of support for the super-patterns.The fusion preparation is divided into four steps:obtain the suffix OPP of pattern p,calculate the occurrence of suffix OPP,forward verify the occurrence of pattern p,and backward search the occurrence of all fusible patterns.In the process of calculating the support of the super-patterns,a pruning strategy is proposed to further reduce the number of candidate patterns.The experimental results verify the efficiency of COOP algorithm on the real datasets.

作者王珍武优西孟玉飞李艳 WANG Zhen;WU Youxi;MENG Yufei;LI Yan(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China;School of Economics Management,Hebei University of Technology,Tianjin 300401,China)

机构地区河北工业大学人工智能与数据科学学院河北工业大学经济管理学院

出处《小型微型计算机系统》 CSCD 北大核心 2024年第6期1384-1391,共8页 Journal of Chinese Computer Systems

基金河北省社会科学基金项目(HB19GL055)资助。

关键词序列模式挖掘时间序列保序模式共生模式 sequential pattern mining time series order-preserving pattern co-occurrence pattern

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1赵晓倩,武优西,王月华,李艳.一种保序序列快速挖掘算法:RSMM[J].郑州大学学报（理学版）,2022,54(4):64-70. 被引量：6
2刘锦,武优西,王月华,李艳.近似保序序列模式挖掘[J].小型微型计算机系统,2023,44(3):490-496. 被引量：2
3王慧锋,段磊,左劼,王文韬,李钟麒,唐常杰.免预设间隔约束的对比序列模式高效挖掘[J].计算机学报,2016,39(10):1979-1991. 被引量：15
4杨皓,段磊,胡斌,邓松,王文韬,秦攀.带间隔约束的Top-k对比序列模式挖掘[J].软件学报,2015,26(11):2994-3009. 被引量：22

二级参考文献57

1Agrawal R, Srikant R. Mining sequential patterns. In: Proc. of the 11th Int’l Conf. on Data Engineering. Washington: IEEE Computer Society Press, 1995. 3-14. [doi: 10.1109/ICDE.1995.380415].
2Zaki MJ. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 2001,42(l-2):31-60. [doi: 10.1023/A: 1007652502315].
3Ji X, Bailey J, Dong G. Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge & Information Systems, 2007,11(3):259-286. [doi: 10.1007/sl0115-006-0038-2].
4Yan X, Han J, Afshar R. CloSpan: Mining closed sequential patterns in large datasets. In: Proc. of the 3rd SIAM Int’l Conf. on Data Mining. SIAM, 2003. 166-177. [doi: 10.1137/1.9781611972733.15].
5Pei J, Wang H, Liu J, Wang K, Wang J, Yu PS. Discovering frequent closed partial orders from strings. IEEE Trans, on Knowledge & Data Engineering, 2006,18(11): 1467-1481. [doi: 10.1109/TKDE.2006.172].
6Zhang M, Kao B, Cheung DW, Yip KY. Mining periodic patterns with gap requirement from sequences. ACM Trans, on Knowledge Discovery from Data, 2007,l(2):Article 7. [doi: 10.1145/1267066.1267068].
7Yang H, Duan L, Dong G, Nummenmaa J, Tang C, Li X. Mining itemset-based distinguishing sequential patterns with gap constraint. In: Proc. of the 21st Int’l Conf. of Database Systems for Advanced Applications. Switzerland: Springer-Verlag, 2015. 39-54. [doi: 10.1007/978-3-319-18120-2_3].
8Ferreira PG, Azevedo PJ. Protein sequence pattern mining with constraints. In: Proc. of the 9th European Conf. on Principles and Practice of Knowledge Discovery in Databases. Berlin, Heidelberg: Springer-Verlag, 2005. 96-107. [doi: 10.1007/11564126 14].
9She R, Chen F, Wang K, Ester M, Gardy JL, Brinkman FSL. Frequent-Subsequence-Based prediction of outer membrane proteins. In: Proc. of the 9th ACM Knowledge Discovery and Data Mining. New York: ACM Press, 2003. 436-445. [doi: 10.1145/956750. 956800].
10Wu X, Zhu X, He Y, Arslan AN. PMBC: Pattern mining from biological sequences with wildcard constraints. Computers in Biology & Medicine, 2013,43(5):481 -492. [doi: 10.1016/j.compbiomed.2013.02.006].

共引文献33

1赵静,李俊,龙春,万巍,杨帆.基于频繁项集挖掘的长周期异常行为检测[J].计算机应用研究,2020,37(S02):221-223. 被引量：2
2邱萍,董祥军.正负序列模式中的约束条件研究[J].齐鲁工业大学学报,2016,30(5):39-45.
3Youxi WU,Cong SHEN,He JIANG,Xindong WU.Strict pattern matching under non-overlapping condition[J].Science China(Information Sciences),2017,60(1):1-16. 被引量：4
4魏芹双.对比模式挖掘研究进展[J].网络安全技术与应用,2017(1):44-44. 被引量：1
5张海清,李代伟,刘胤田,龚程,于曦.最大模糊频繁模式挖掘算法[J].计算机应用,2017,37(5):1424-1429. 被引量：1
6陈湘涛,肖碧文.基于位置信息的显露序列模式挖掘研究[J].计算机科学,2017,44(7):175-179.
7张鹏,段磊,秦攀,左劼,唐常杰,元昌安,彭舰.基于Spark的Top-k对比序列模式挖掘[J].计算机研究与发展,2017,54(7):1452-1464. 被引量：7
8胡法奎,陈高云,龚程,张海清.面向大规模医疗数据的模糊频繁模式挖掘研究[J].信息通信,2017,30(3):14-16. 被引量：2
9李安亚,王少妮.对比模式挖掘研究进展[J].科研信息化技术与应用,2017,8(5):66-73. 被引量：1
10韩超,段磊,邓松,王慧锋,唐常杰.基于Spark的序列数据质量评价[J].计算机科学与探索,2017,11(6):897-907. 被引量：1

1李勇强,余凤君,张兼.浅析江西省普通公路不停车超限检测系统[J].江西交通科技,2023(4):83-87.
2白瑞.基于智慧交通的超限超载治理措施[J].运输经理世界,2023(28):58-60.
3何宇昂,王欣,沈玲珍.大图中多样化Top-k模式挖掘算法研究[J].计算机科学,2024,51(5):70-84.
4孟玉飞,武优西,王珍,李艳.对比保序模式挖掘算法[J].计算机应用,2023,43(12):3740-3746.
5陈书健,芦俊丽.基于Voronoi图和距离衰减效应的模糊实例空间并置模式挖掘算法[J].数据挖掘,2024,14(2):65-80.
6昌鑫,芦俊丽,陈书健,段鹏.基于改进列计算的空间并置模式挖掘方法[J].计算机应用研究,2024,41(5):1374-1380.
7陈述团.公路治超非现场执法监测点建设实践与思考[J].工程技术研究,2024,9(4):194-196.

小型微型计算机系统

2024年第6期

浏览历史

内容加载中请稍等...

一种共生保序模式挖掘算法

参考文献4

二级参考文献57

共引文献33

相关作者

相关机构

相关主题

浏览历史