期刊文献+

基于扩展自然序树的概化关联规则增量挖掘方法 被引量:8

An Incremental Method for Mining Generalized Association Rules Based on Extended Canonical-Order Tree
下载PDF
导出
摘要 概化关联规则挖掘作为数据挖掘领域一个重要的拓展性研究课题,首先提出了一种概化扩展自然序树(generalized extended canonical-order tree,GECT)结构及其增量挖掘算法GECT-IM.该算法对原始分类事务数据库只扫描一次,就可以将所有交易信息映射至一棵压缩格式的GECT,然后通过对更新交易数据集扫描得到更新数据集中各项集的计数,结合相关性质及运算就可以发现大部分更新后的概化频繁项集;其次,针对GECT规模较大以及GECT-IM 算法仍然可能需要遍历初始GECT树的局限,在界定数据库更新和重构概念的基础上,基于一种可量化度量的准最小支持度阈值,提出了一种改进的准频繁概化扩展自然序树(pre-large generalized extended canonical-order tree,PGECT)结构及其增量挖掘算法PGECT-IM.由于有效避免了对初始GECT进行遍历的情形,从而进一步提升了概化关联规则增量挖掘效率.实验证明,提出的概化关联规则增量挖掘算法 GECT-IM 及其优化算法PGECT-IM,比现有增量挖掘算法具有更高的挖掘效率和更好的扩展性. Mining generalized association rules is one of the important research areas in data mining.In real life applications,the transaction database is updated frequently.It makes the maintenance of generalized association rules one of challenging research work.In this paper,firstly,by analyzing and summarizing all the update cases of the taxonomy data,several special properties of updating are concluded;secondly,we project the transaction databases to a new compact structure called GECT(generalized extended canonical-order tree) composed of two header tables that can be used to mine the whole updated tree and the incremental tree.Thirdly,we propose an incremental updating algorithm GECT-IM,which finds most updated frequent itemsets by scanning the updating transactions set instead of the original database;To tackle the limit of GECT-IM,which still need scan the GECT when the infrequent itemsets become frequent,we propose a further optimized structure called PGECT(pre-large generalized extended canonical-order tree) and an efficient algorithm PGECT-IM.Within the certain updating scope,it can find all the updated frequent itemsets without rescanning the original PGECT.The experiments on synthetic datasets show that our algorithms,both GECT-IM and PGECT-IM,are not only correct and complete but also outperform the well-known and current algorithms.
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第3期598-606,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(90818023)
关键词 分类数据 概化关联规则 增量挖掘 概化扩展自然序树 准频繁概化扩展自然序树 taxonomy generalized association rule incremental updating generalized extended canonical-order tree(GECT) pre-large generalized extended canonical-order tree(PGECT)
  • 相关文献

参考文献17

  • 1Agrawal R, Srikant R. association rules [C]//Proc Data Bases. San Francisco: Fast algorithms for mining of the Int Conf on Very Large Morgan Kaufmann, 1994: 487-499.
  • 2Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation [C]//Proc of the ACM Annual Conf on Management of Data. New York: ACM, 2000:1-12.
  • 3Srikant R, Agrawal R. Mining generalized association rules [C] //Proe of the Int Conf on Very Large Data Bases. San Francisco: Morgan Kaufmann, 1995:407-419.
  • 4Hipp J, Myka A, Wirth R, et al. A new algorithm for faster mining of generalized association rules [C] //Proc of the European Syrup on principles of data mining and knowledge discovery. Berlin: Springer, 1998:74-82.
  • 5Sriphaew K, Theeramunkong T. Fast algorithm for mining generalized frequent patterns of generalized association rules[J]. IEICE Trans on Information and Systems, 2004, E87-D (3) : 761-770.
  • 6Pramudiono I, Kitsuregawa M. FP-tax: Tree structure based generalized association rule mining [C] //Proc of the ACM Workshop on Research Issues in Data Mining and Knowledge Discovery. New York: ACM, 2004: 60-63.
  • 7Daniel Kunkle,张冬晖,Gene Cooperman.Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy[J].Journal of Computer Science & Technology,2008,23(1):77-102. 被引量:2
  • 8Cheung D, Han J, Ng V, etal. Maintenance of discovered association rules in large databases: An incremental updating technique [C] //Proc of the 12th Int Conf on Data Engineering. Los Alamitos: IEEE Computer Society, 1996: 106-114.
  • 9Cheung D, Lee S D, Kao B. A general incremental technique for updating discovered association rules [C]//Proc of the 5th Int Conf on Database Systems for Advanced Applications. Berlin: Springer, 1997: 185-194.
  • 10Ayan N F, Tansel A U, Arkun M E. An efficient algorithm to update large itemsets with early pruning [C]//Proc of the 5th ACM Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 1999:287-291.

二级参考文献76

  • 1[1]Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Date, Washington DC, 1993.207~216
  • 2[2]Agrawal R, Srikant R. Fast algorithm for mining association rules. In: Proceedings of the 20th International Conference on VLDB, Santiago, Chile, 1994. 487~499
  • 3[3]Han J, Kamber M. Data Mining: Concepts and Techniques. Beijing: Higher Education Press, 2001
  • 4[5]Agrawal R, Shafer J C. Parallel mining of association rules:Design, implementation, and experience. IBM Research Report RJ 10004,1996
  • 5[6]Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules. In: Proceedings of the 21th International Conference on VLDB, Zurich, Switzerland, 1995. 432~444
  • 6[7]Hah J, Jian P et al. Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Dallas, TX, 2000.1~12
  • 7[8]Cheung D W, Lee S D, Kao B. A general incremental technique for maintaining discovered association rules. In: Proceedings of databases systems for advanced applications, Melbourne, Australia, 1997. 185~194
  • 8[10]Han J, Jian P. Mining access patterns efficiently from web logs. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'00), Kyoto, Japan,2000. 396~407
  • 9[11]Agrawal R, Srikant R. Mining sequential pattern. In: Proceedings of the 11th International Conference on Data Engineering, Taipei, 1995. 3~14
  • 10Ramaswamy S. et al.. On the discovery of interesting patterns in association rules. In: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), New York, 1998, 368~379

共引文献111

同被引文献83

引证文献8

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部