期刊文献+

分布式环境下全局序列模式挖掘技术研究 被引量:2

Global sequential pattern mining in distributed environment
下载PDF
导出
摘要 由于分布式环境下挖掘全局序列模式常常产生过多候选序列,加大了网络通信代价。为此提出一种基于分布式环境下的全局序列模式快速挖掘算法。该算法将各站点得到的局部序列模式压缩到一种语法序列树上,避免了重复的序列前缀传输;基于合并树中节点序列规则和简单的特点,提出一种项扩展和序列扩展剪枝策略,有效地约减了候选序列,减少了网络传输量,从而快速生成全局序列模式。理论和实验表明,在大数据集环境下该算法性能优越,能够有效地挖掘全局序列模式。 There were too many candidate sequences generated from sequential pattern mining algorithms in distributed environment which led to communication overhead.To deal with this problem,a new algorithm,Fast Mining of Global Sequential Pattern(FMGSP) in distributed system was proposed.The core idea of this algorithm was to compress local frequent sequential patterns into the corresponding lexicographic sequence tree so as to avoid transmission of repeated prefixes.Based on the regular and simple sequences of merged trees,a new pruning method named Item Extension and Sequence Extension(I/S-E) pruning was presented to prune candidate sequences effectively.Therefore,communication overhead was significantly reduced and global sequential patterns were generated quickly.Theories and experiments showed that the performance of FMGSP was superior,and it was effective specially in mining global sequential patterns for huge amount of data.
出处 《计算机集成制造系统》 EI CSCD 北大核心 2007年第11期2229-2235,共7页 Computer Integrated Manufacturing Systems
基金 国家自然科学基金资助项目(60773103 70472033 60673060) 国家科技基础条件平台资助项目(2004DKA20310) 江苏省自然科学基金资助项目(BK2005047) 江苏省"青蓝工程"基金资助项目。~~
关键词 数据挖掘 全局序列模式 语法序列树 项扩展和序列扩展剪枝 data mining global sequential pattern lexicographic sequence tree item extension and sequence extension pruning
  • 相关文献

参考文献13

  • 1SRIKANT R, AGRAWAL R. Mining sequential patterns: generalizations and performance improvements[C]// Proceedings of the 5th International Conference on EDBT. Heidelberg, Germany: Springer, 1996: 3-17.
  • 2MANNILA H, TOIVONEN H, VERKAMO A I. Discovery of frequent episodes in sequences[C]// Proceedings of the 1st International Conference on KDD. New York, N. Y. ,USA: ACM Press, 1995:210-215.
  • 3GAROFALAKIS M, RASTOGI R, SHIM K. Spirit: sequen tial pattern mining with regular expression constraints [C]//Proceedings of the 25th International Conference on VLDB. San Francisco, Cal., USA: Morgan Kanfmann, 1999: 223-234.
  • 4ZAKI M. Spade: an efficient algorithm for mining frequent sequences[J]. Machine Learning, 2001, 41(2): 31-60.
  • 5HAN J, PEI J. Freespan: frequent pattern-projected sequential pattern mining[C]// Proceedings of the 2000 International Conference on KDD. New York, N. Y. ,USA: ACM Press, 2000:355-359.
  • 6PEI J, HAN J, MORTAZAVI ASI. B, et al. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth[C]// Proceedings of 2001 International Conference Data Engineering. Heidelberg, Germany: Springer, 2001:215-224.
  • 7GURALNIK V, GARG N, VIPIN K. Parallel tree projection algorithm for sequence mining[C]//Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing. London, UK: Springer-Verlag,2001:310-320.
  • 8PARK J S, CHEN M S, YU P S. An efficient parallel data mining for association rules[C]//Proceedings of the 4th International Conference on Information and Knowledge Management. New York, N. Y.,USA: ACM Press, 1995:31-36.
  • 9AGRAWAL R, SHAFER J. Parallel mining of association rules[J]. IEEE Transactions on Knowledge and Data Engineering, 1996, 8(6): 962-969.
  • 10陆介平,杨明,孙志挥,鞠时光.快速挖掘全局最大频繁项目集[J].软件学报,2005,16(4):553-560. 被引量:27

二级参考文献30

  • 1RAgrawa1 TImie1inSki Aswami.Mining association ru1es between sets of items in 1arge database[J].The ACM SIGMOD Intemationa1 Conf on Management of Data, Washington, DC,1993,.
  • 2Han J, Kamber M. Data Mining: Concepts and Techniques. Beijing: High Education Press, 2001.
  • 3Agrawal R, ImielinSki T, Swami A. Mining association rules between sets of items in large database. In: Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Vol 2, Washington DC: SIGMOD, 1993. 207-216.
  • 4Agrawal, R Srikant. Fast algorithms for mining association rules. In: Proc. of the 20th Int'l Conf. Very Large Data Bases(VLDB'94). 1994.487-499.
  • 5Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: Proc. of the 2000 ACM-SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 1-12.
  • 6Bayardo RJ. Efficiently mining long patterns from databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int'l Conf.on Management of Data. New York: ACM Press, 1998.85-93.
  • 7Lin D, Kedem ZM. Pincer-Search: A new algorithm for discovering the maximum frequent set. In: Proc. of the 6th European Conf.on Extending Database Technology. Heidelberg: Springer-Verlag, 1998. 105-119.
  • 8Park JS, Chen MS, Yu PS. Efficient parallel data mining for association rules. In: Proc. of the 4th Int'l Conf. on Information and Knowledge Management. 1995. 31-36.
  • 9Agrawal R, Shafer J. Parallel mining of association rules. IEEE Trans. on Knowledge and Data Engineering, 1996,8(6):962-969.
  • 10Cheung DW, Han JW, Ng VT. A fast distributed algorithm for mining association rules. In: Proc. of the IEEE 4th Int'l Conf.Parallel and Distributed Information Systems. Miami Beach: IEEE Press, 1996. 31-44.

共引文献74

同被引文献19

  • 1陆介平,杨明,孙志挥,鞠时光.快速挖掘全局最大频繁项目集[J].软件学报,2005,16(4):553-560. 被引量:27
  • 2宋世杰,胡华平,周嘉伟,金士尧.一种基于大项集重用的序列模式挖掘算法[J].计算机研究与发展,2006,43(1):68-74. 被引量:10
  • 3张长海,胡孔法,陈凌.序列模式挖掘算法综述[J].扬州大学学报(自然科学版),2007,10(1):41-46. 被引量:5
  • 4Park J S, Psy U. An efficient parallel data mining for association rules [ C ]//Proc of the 4th on Information and Knowledge Management. New York: ACM Press, 1995 : 31 - 36.
  • 5Cheung D W, Hart J, Ng V T, et al. A fast distributed algorithm for mining association rules [ C ]//Proc of the 4th International Conference on Parallel and Distributed Information Systems. Los Alamitos, USA:IEEE Computer Society Press, 1996 : 31 - 44.
  • 6Zaki M. Spade: an efficient algorithm for mining frequent sequences [ J]. Machine Learning, 2001, 41 (2) : 31 -60.
  • 7Pei J, Han J, Pinto H, et al. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth [ J ]. IEEE Transactions on Knowledge & Data Engineering, 2004,16( 1 ) : 1424 - 1440.
  • 8Zhang Changhai, Hu Kongfa, Liu Haidong, et al. FMGSP: an efficient method of mining global sequential patterns[ C ]//Proc of the 4th International Conference on Fuzzy Systems and Knowledge Discovery. Los Alamitos : IEEE Computer Society, 2007 : 761 - 765.
  • 9Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance improvements [ C ]// Proc of 5th International Conference on Extending Database Technology. Heidelberg : Springer, 1996 : 3 - 17.
  • 10Han J, Kamber M. Data mining concepts and techniques [ M ]. 2nd ed. 北京:机械工业出版社, 2006 : 489 - 513.

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部