期刊文献+

DMGSP:一种快速分布式全局序列模式挖掘算法 被引量:2

DMGSP: an algorithm of distributed mining global sequential pattern on distributed system
下载PDF
导出
摘要 为了解决分布式环境下挖掘全局序列模式常产生过多候选序列,加大网络通信代价问题,提出了一种基于分布式环境下的快速挖掘全局序列模式算法——DMGSP.该算法将分布式环境下的各站点得到的局部序列模式压缩到一种语法序列树上,避免了重复的序列前缀传输.采用合并树中结点序列规则和项序扩展策略,对非频繁序列进行剪枝,有效地约简了候选序列,减少了网络传输量,从而快速生成全局序列模式.算法分析和实验结果表明,在大数据集环境下的DMGSP算法性能优越,能够有效地挖掘全局序列模式. The current distributed sequential pattern mining algorithms usually generate too many candidate sequences and therefore increase communication overhead. To solve this problem, an efficient algorithm-DMGSP ( distributed mining of global sequential pattern) of mining global sequential pattern on distributed system is proposed. DMGSP algorithm compresses local frequent sequential patterns into a lexicographic sequence tree, and avoids translation of repeated prefixes. By using the sequences regular of merged trees and efficient item and sequence extension pruning, non-frequent subsequence is pruned and candidate sequences can be reduced effectively. Therefore, communication overhead is reduced and global sequential patterns is generated effectively. The theory and experiments show that the performance of DMGSP is superior, which is advantageous for mining global sequential patterns with huge amount of data.
出处 《东南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2007年第4期574-579,共6页 Journal of Southeast University:Natural Science Edition
基金 国家自然科学基金资助项目(70472033) 江苏省"青蓝工程"基金资助项目
关键词 数据挖掘 分布式系统 全局序列模式 语法序列树 data mining distributed system global sequential pattern lexicographic sequence tree
  • 相关文献

参考文献12

  • 1Srikant R,Agrawal R.Mining sequential patterns:generalizations and performance improvements[C]//Proc of the Fifth Int Conference on Extending Database Technology.Heidelberg:Springer,1996:3-17.
  • 2Mannila H,Toivonen H,Verkamo A I.Discovery of frequent episodes in sequences[C]//Proc of the First Int Conference on Knowledge Discovery and Data Mining.New York:ACM Press,1995:210-215.
  • 3Garofalakis M,Rastogi R,Shim K.Spirit:sequential pattern mining with regular expression constraints[C]//Proc of the 25th Int Conference on Very Large Databases.San Francisco:Morgan Kaufmann,1999:223-234.
  • 4Zaki M.Spade:an efficient algorithm for mining frequent sequences[J].Machine Learning,2001,41(1/2):31-60.
  • 5Han J,Pei J.FreeSpan:frequent pattern-projected sequential pattern mining[C]//Proc of the International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2000:355-359.
  • 6Pei J,Han J,Mortazavi-Asl B,et al.PrefixSpan:mining sequential patterns efficiently by prefix-projected pattern growth[C]//Proc of the Int Conf Data Engineering.Heidelberg:Springer,2001:215-224.
  • 7Yan X,Han J,Afshar R.CloSpan:mining closed sequential patterns in large databases[C]//Proc 2003 SIAM Int Conf Data Mining (SDM'03).San Francisco:Morgan Kaufmann,2003:166-177.
  • 8Guralnik V,Garg N,Vipin K.Parallel tree projection algorithm for sequence mining[C]//Euro Par 2001.Manchester:Springer-Verlag,2001:310-320.
  • 9Agrawal R,Shafer J.Parallel mining of association rules[J].IEEE Trans on Knowledge and Data Engineering,1996,8(6):962-969.
  • 10陆介平,杨明,孙志挥,鞠时光.快速挖掘全局最大频繁项目集[J].软件学报,2005,16(4):553-560. 被引量:27

二级参考文献29

  • 1Han J, Kamber M. Data Mining: Concepts and Techniques. Beijing: High Education Press, 2001.
  • 2Agrawal R, ImielinSki T, Swami A. Mining association rules between sets of items in large database. In: Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Vol 2, Washington DC: SIGMOD, 1993. 207-216.
  • 3Agrawal, R Srikant. Fast algorithms for mining association rules. In: Proc. of the 20th Int'l Conf. Very Large Data Bases(VLDB'94). 1994.487-499.
  • 4Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: Proc. of the 2000 ACM-SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 1-12.
  • 5Bayardo RJ. Efficiently mining long patterns from databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int'l Conf.on Management of Data. New York: ACM Press, 1998.85-93.
  • 6Lin D, Kedem ZM. Pincer-Search: A new algorithm for discovering the maximum frequent set. In: Proc. of the 6th European Conf.on Extending Database Technology. Heidelberg: Springer-Verlag, 1998. 105-119.
  • 7Park JS, Chen MS, Yu PS. Efficient parallel data mining for association rules. In: Proc. of the 4th Int'l Conf. on Information and Knowledge Management. 1995. 31-36.
  • 8Agrawal R, Shafer J. Parallel mining of association rules. IEEE Trans. on Knowledge and Data Engineering, 1996,8(6):962-969.
  • 9Cheung DW, Han JW, Ng VT. A fast distributed algorithm for mining association rules. In: Proc. of the IEEE 4th Int'l Conf.Parallel and Distributed Information Systems. Miami Beach: IEEE Press, 1996. 31-44.
  • 10Cheung DW, Lee SD, Xiao YQ. Effect of data skewness and workload balance in parallel data mining. IEEE Trans. on Knowledge and Data Engineering, 2002,14(3):498-514.

共引文献43

同被引文献20

  • 1张利军,李战怀,王淼.基于位置信息的序列模式挖掘算法[J].计算机应用研究,2009,26(2):529-531. 被引量:12
  • 2邹翔,张巍,刘洋,蔡庆生.分布式序列模式发现算法的研究[J].软件学报,2005,16(7):1262-1269. 被引量:19
  • 3AGRAWAL R, SR1KANT R. Mining sequential patterns[ C]//Proceedings of the 11th International Conference on Data Engineering. Taipei: [s. n. ], 1995:3 - 14.
  • 4HAN J, PEI J, MORTAZAVI-ASL B, et al. PrefixSpan-Mining sequential patterns efficiently by prefix-projected pattern growth[ C]// Proceedings of the 17th International Conference on Data Engineering. Heidelberg, DE: [s.n. ], 2001:215-224.
  • 5ZAKI M J. Parallel sequence mining on shared-memory machines [ J]. Journal of Parallel and Distributed Computing, 2001, 6(1) : 401 - 426.
  • 6GURALNIK V, GARG N, KARYPIS G. Parallel tree projection algorithm for sequence mining[ C]// Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing. London, UK: Springer-Verlag, 2001: 310-320.
  • 7ZHOU LI-JUAN, QIN BAI, WANG YU, et al. Research on parallel algorithm for sequential pattern mining[ C]// Proceedings of theSPIE Conference on Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security. [ S. L. ] : SPIE, 2008:69 -73.
  • 8CHEN JIN-LIN, COOK T. Mining contiguous sequential patterns from Web logs[ C]// Proceedings of the 16th International Conference on World Wide Web. New York: ACM Press, 2007:1177 -1178.
  • 9Agrawal R, Srikant R. Mining sequential patterns [ C ]//Proc of the 11 th International Conference on Data Engineering. Washington DC: IEEE Computer Society, 1995 : 3-14.
  • 10Srikant R, Agrawal R. Mining sequential pattern:generalizations and performance improvements[ M ]. Berlin : Springer, 1996.

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部