分布式环境下全局序列模式挖掘技术研究被引量：2

Global sequential pattern mining in distributed environment

下载PDF

导出

摘要由于分布式环境下挖掘全局序列模式常常产生过多候选序列,加大了网络通信代价。为此提出一种基于分布式环境下的全局序列模式快速挖掘算法。该算法将各站点得到的局部序列模式压缩到一种语法序列树上,避免了重复的序列前缀传输;基于合并树中节点序列规则和简单的特点,提出一种项扩展和序列扩展剪枝策略,有效地约减了候选序列,减少了网络传输量,从而快速生成全局序列模式。理论和实验表明,在大数据集环境下该算法性能优越,能够有效地挖掘全局序列模式。 There were too many candidate sequences generated from sequential pattern mining algorithms in distributed environment which led to communication overhead.To deal with this problem,a new algorithm,Fast Mining of Global Sequential Pattern（FMGSP） in distributed system was proposed.The core idea of this algorithm was to compress local frequent sequential patterns into the corresponding lexicographic sequence tree so as to avoid transmission of repeated prefixes.Based on the regular and simple sequences of merged trees,a new pruning method named Item Extension and Sequence Extension（I/S-E） pruning was presented to prune candidate sequences effectively.Therefore,communication overhead was significantly reduced and global sequential patterns were generated quickly.Theories and experiments showed that the performance of FMGSP was superior,and it was effective specially in mining global sequential patterns for huge amount of data.

作者胡孔法张长海陈崚宋爱波达庆利

机构地区扬州大学计算机科学与工程系东南大学经济管理学院东南大学计算机科学与工程学院

出处《计算机集成制造系统》 EI CSCD 北大核心 2007年第11期2229-2235,共7页 Computer Integrated Manufacturing Systems

基金国家自然科学基金资助项目(60773103 70472033 60673060) 国家科技基础条件平台资助项目(2004DKA20310) 江苏省自然科学基金资助项目(BK2005047) 江苏省"青蓝工程"基金资助项目。~~

关键词数据挖掘全局序列模式语法序列树项扩展和序列扩展剪枝 data mining global sequential pattern lexicographic sequence tree item extension and sequence extension pruning

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献13

1SRIKANT R, AGRAWAL R. Mining sequential patterns: generalizations and performance improvements[C]// Proceedings of the 5th International Conference on EDBT. Heidelberg, Germany: Springer, 1996: 3-17.
2MANNILA H, TOIVONEN H, VERKAMO A I. Discovery of frequent episodes in sequences[C]// Proceedings of the 1st International Conference on KDD. New York, N. Y. ,USA: ACM Press, 1995:210-215.
3GAROFALAKIS M, RASTOGI R, SHIM K. Spirit: sequen tial pattern mining with regular expression constraints [C]//Proceedings of the 25th International Conference on VLDB. San Francisco, Cal., USA: Morgan Kanfmann, 1999: 223-234.
4ZAKI M. Spade: an efficient algorithm for mining frequent sequences[J]. Machine Learning, 2001, 41(2): 31-60.
5HAN J, PEI J. Freespan: frequent pattern-projected sequential pattern mining[C]// Proceedings of the 2000 International Conference on KDD. New York, N. Y. ,USA: ACM Press, 2000:355-359.
6PEI J, HAN J, MORTAZAVI ASI. B, et al. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth[C]// Proceedings of 2001 International Conference Data Engineering. Heidelberg, Germany: Springer, 2001:215-224.
7GURALNIK V, GARG N, VIPIN K. Parallel tree projection algorithm for sequence mining[C]//Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing. London, UK: Springer-Verlag,2001:310-320.
8PARK J S, CHEN M S, YU P S. An efficient parallel data mining for association rules[C]//Proceedings of the 4th International Conference on Information and Knowledge Management. New York, N. Y.,USA: ACM Press, 1995:31-36.
9AGRAWAL R, SHAFER J. Parallel mining of association rules[J]. IEEE Transactions on Knowledge and Data Engineering, 1996, 8(6): 962-969.
10陆介平,杨明,孙志挥,鞠时光.快速挖掘全局最大频繁项目集[J].软件学报,2005,16(4):553-560. 被引量：27

二级参考文献30

1RAgrawa1 TImie1inSki Aswami.Mining association ru1es between sets of items in 1arge database[J].The ACM SIGMOD Intemationa1 Conf on Management of Data, Washington, DC,1993,.
2Han J, Kamber M. Data Mining: Concepts and Techniques. Beijing: High Education Press, 2001.
3Agrawal R, ImielinSki T, Swami A. Mining association rules between sets of items in large database. In: Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Vol 2, Washington DC: SIGMOD, 1993. 207-216.
4Agrawal, R Srikant. Fast algorithms for mining association rules. In: Proc. of the 20th Int'l Conf. Very Large Data Bases(VLDB'94). 1994.487-499.
5Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: Proc. of the 2000 ACM-SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 1-12.
6Bayardo RJ. Efficiently mining long patterns from databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int'l Conf.on Management of Data. New York: ACM Press, 1998.85-93.
7Lin D, Kedem ZM. Pincer-Search: A new algorithm for discovering the maximum frequent set. In: Proc. of the 6th European Conf.on Extending Database Technology. Heidelberg: Springer-Verlag, 1998. 105-119.
8Park JS, Chen MS, Yu PS. Efficient parallel data mining for association rules. In: Proc. of the 4th Int'l Conf. on Information and Knowledge Management. 1995. 31-36.
9Agrawal R, Shafer J. Parallel mining of association rules. IEEE Trans. on Knowledge and Data Engineering, 1996,8(6):962-969.
10Cheung DW, Han JW, Ng VT. A fast distributed algorithm for mining association rules. In: Proc. of the IEEE 4th Int'l Conf.Parallel and Distributed Information Systems. Miami Beach: IEEE Press, 1996. 31-44.

共引文献74

1杨明,孙志挥,宋余庆.快速更新全局频繁项目集[J].软件学报,2004,15(8):1189-1197. 被引量：18
2冀振明,陶世群.基于电信运营中大客户流失的数据挖掘模型[J].计算机工程与应用,2004,40(23):169-171. 被引量：5
3杨明,孙志挥.一种基于最大加权频繁项目集的数据库相似性判别算法[J].计算机研究与发展,2004,41(10):1774-1779. 被引量：1
4李宏,杜剑峰,陈松乔.分布式数据库约束性关联规则挖掘[J].中南大学学报（自然科学版）,2004,35(6):998-1003. 被引量：1
5杨明,杨萍.一种基于前缀广义表的快速间接关联挖掘算法[J].安徽工程科技学院学报（自然科学版）,2004,19(4):40-45.
6杜威,邹先霞.基于PC-树的关联规则挖掘方法[J].计算机工程与设计,2005,26(2):445-447. 被引量：3
7陆介平,杨明,孙志挥,鞠时光.快速挖掘全局最大频繁项目集[J].软件学报,2005,16(4):553-560. 被引量：27
8阮幼林,李庆华,刘干.分布环境中的并行频繁模式挖掘算法[J].计算机工程与应用,2005,41(25):1-3. 被引量：3
9曹洪其,姜志峰,孙志挥.分布式数据库多层关联规则挖掘算法研究[J].计算机应用,2005,25(12):2858-2861. 被引量：1
10何波,王华秋,刘贞,王越.快速挖掘频繁项集的并行算法[J].计算机应用,2006,26(2):391-392. 被引量：5

同被引文献19

1陆介平,杨明,孙志挥,鞠时光.快速挖掘全局最大频繁项目集[J].软件学报,2005,16(4):553-560. 被引量：27
2宋世杰,胡华平,周嘉伟,金士尧.一种基于大项集重用的序列模式挖掘算法[J].计算机研究与发展,2006,43(1):68-74. 被引量：10
3张长海,胡孔法,陈凌.序列模式挖掘算法综述[J].扬州大学学报（自然科学版）,2007,10(1):41-46. 被引量：5
4Park J S, Psy U. An efficient parallel data mining for association rules [ C ]//Proc of the 4th on Information and Knowledge Management. New York: ACM Press, 1995 : 31 - 36.
5Cheung D W, Hart J, Ng V T, et al. A fast distributed algorithm for mining association rules [ C ]//Proc of the 4th International Conference on Parallel and Distributed Information Systems. Los Alamitos, USA:IEEE Computer Society Press, 1996 : 31 - 44.
6Zaki M. Spade: an efficient algorithm for mining frequent sequences [ J]. Machine Learning, 2001, 41 (2) : 31 -60.
7Pei J, Han J, Pinto H, et al. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth [ J ]. IEEE Transactions on Knowledge & Data Engineering, 2004,16( 1 ) : 1424 - 1440.
8Zhang Changhai, Hu Kongfa, Liu Haidong, et al. FMGSP: an efficient method of mining global sequential patterns[ C ]//Proc of the 4th International Conference on Fuzzy Systems and Knowledge Discovery. Los Alamitos : IEEE Computer Society, 2007 : 761 - 765.
9Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance improvements [ C ]// Proc of 5th International Conference on Extending Database Technology. Heidelberg : Springer, 1996 : 3 - 17.
10Han J, Kamber M. Data mining concepts and techniques [ M ]. 2nd ed. 北京:机械工业出版社, 2006 : 489 - 513.

引证文献2

1胡孔法,张长海,陈崚,达庆利.一种面向物流数据分析的路径序列挖掘算法ImGSP[J].东南大学学报（自然科学版）,2008,38(6):970-974. 被引量：6
2张长海,胡孔法,陈崚,宋爱波.一种高效的基于位图序列模式挖掘算法[J].高技术通讯,2010,20(2):133-137. 被引量：1

二级引证文献7

1张帆,王水萍.物流系统中面向数据分析的路径规划算法研究[J].物流技术,2013,32(6):173-175. 被引量：1
2李杰,王娜娜,李志鹏,徐勇.面向个性化交通信息服务的车辆行驶路径关联规则挖掘[J].系统工程理论与实践,2013,33(12):3209-3215. 被引量：4
3杨剑峰,张备,郭胜,王宗南.基于时间序列模型的UOE焊管生产过程多目标优化[J].焊管,2015,38(3):29-37.
4杨俊瑶,蒙祖强,蒋亮.一种基于拓扑信息的物流频繁路径挖掘算法[J].计算机科学,2015,42(4):258-262. 被引量：7
5苏健民,张凡,李思阳.一种基于区域路径优化的混合聚类方法[J].黑龙江大学自然科学学报,2016,33(3):399-404. 被引量：3
6王健,车冬娟,任琰杰.基于带权有向图的物流频繁路径挖掘研究[J].信息与电脑,2021,33(2):68-70. 被引量：1
7张书涵,费超群,黄锡昆,李阳阳.工作流网频繁子网挖掘研究进展[J].高技术通讯,2022,32(8):811-824.

1龚振志,胡孔法,达庆利,张长海.DMGSP:一种快速分布式全局序列模式挖掘算法[J].东南大学学报（自然科学版）,2007,37(4):574-579. 被引量：2
2俞单庆,吉根林.基于数据流的序列模式挖掘算法[J].江南大学学报（自然科学版）,2007,6(6):763-768.
3冯洁,陶宏才.一种频繁项集的快速挖掘算法[J].微计算机信息,2007(18):164-166. 被引量：7
4王扶东,李兵,薛劲松,朱云龙.客户关系管理中基于约束的关联规则挖掘方法研究[J].计算机集成制造系统-CIMS,2004,10(4):465-470. 被引量：7
5丁卫平,祁恒,董建成,管致锦.基于关联规则的电子病历挖掘算法研究与应用[J].微电子学与计算机,2007,24(3):69-73. 被引量：19
6裴古英.一种基于布尔矩阵的关联规则快速挖掘算法[J].自动化与仪器仪表,2009(5):16-18. 被引量：2
7许普乐,纪允,张勤.应用FP树快速生成无关集算法[J].安庆师范学院学报（自然科学版）,2016,22(2):60-65. 被引量：1
8杨君锐.一种频繁项目集的快速挖掘算法[J].微电子学与计算机,2004,21(2):70-72. 被引量：2
9胡慧蓉,王周敬.一种基于关系矩阵的关联规则快速挖掘算法[J].计算机应用,2005,25(7):1577-1579. 被引量：21
10常鹏,陈耿,朱玉全.一种分布式序列模式挖掘算法[J].计算机应用,2008,28(11):2964-2966. 被引量：2

计算机集成制造系统

2007年第11期

浏览历史

内容加载中请稍等...

分布式环境下全局序列模式挖掘技术研究被引量：2

参考文献13

二级参考文献30

共引文献74

同被引文献19

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

分布式环境下全局序列模式挖掘技术研究 被引量：2

参考文献13

二级参考文献30

共引文献74

同被引文献19

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

分布式环境下全局序列模式挖掘技术研究被引量：2