
一种高效的多层和概化关联规则挖掘方法 被引量:37

Efficient Method for Mining Multiple-Level and Generalized Association Rules
摘要 通过对分类数据的深入研究,提出了一种高效的多层关联规则挖掘方法:首先,根据分类数据所在的领域知识构建基于领域知识的项相关性模型DICM(domain knowledge-based item correlation model),并通过该模型对分类数据的项进行层次聚类;然后,基于项的聚类结果对事务数据库进行约简划分;最后,将约简划分后的事务数据库映射至一种压缩的AFOPT树形结构,并通过遍历AFOPT树替代原事务数据库来挖掘频繁项集.由于缩小了事务数据库规模,并采用了压缩的AFOPT结构,所提出的方法有效地节省了算法的I/O时间,极大地提升了多层关联规则的挖掘效率.基于该方法,给出了一种自顶向下的多层关联规则挖掘算法TD-CBP-MLARM和一种自底向上的多层关联规则挖掘算法BU-CBP-MLARM.此外,还将该挖掘方法成功扩展至概化关联规则挖掘领域,提出了一种高效的概化关联规则挖掘算法CBP-GARM.通过大量人工随机生成数据的实验证明,所提出的多层和概化关联规则挖掘算法不仅可以确保频繁项集挖掘结果的正确性和完整性,还比现有同类最新算法具有更好的挖掘效率和扩展性. This paper proposes a idea for mining multiple-level and generalized association rules. First, an item correlation model is set up, based on the domain knowledge and clusters the items according to their correlation. Secondly, the transaction database, based on the item clusters, are reduced which make the transaction database smaller. Finally, the partitioned transaction databases are projected onto a compact structure called AFOPT-tree and find the frequent itemsets from the AFOPT. Based on the proposed idea, this paper proposes a top-down algorithm TD-CBP-MLARM and a bottom-up algorithm BU-CBP-MLARM to mine the multiple-level association rules. Additionally, this paper extends the idea to a generalized mining association rule and gives a new efficient algorithm CBP-GARM. The experiments show that the proposed algorithms not only corrects and completes mining results, but also outperform the well-known and current algorithms in mining effectiveness.
出处 《软件学报》 EI CSCD 北大核心 2011年第12期2965-2980,共16页 Journal of Software
基金 国家自然科学基金重大研究计划重点项目(90818023) 国家重点基础研究发展计划(973)(2005CB321905)
关键词 分类数据 多层关联规则 概化关联规则 层次聚类 约简划分 taxonomy data multiple-level association rule generalized association rule hierarchical clustering reduction
  • 相关文献


  • 1Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proc. of the Int'l Conf. on Very Large Data Bases (VLDB). Santiago, 1994. 487-499.
  • 2Zaki MJ. Scalable algorithms for association mining. IEEE Trans. on Knowledge and Data Engineering (TKDE), 2000.12(3): 372-373. [doi: 10.1109/69.846291].
  • 3Han J, Pei J, Yin YW. Mining frequent patterns without candidate generation. In: Proe. of the ACM Annual Conf. on Management of Data (SIGMOD). 2000. 1-12. [doi: 10.1145/342009.335372].
  • 4Liu GM, Lu HJ, Lou WW, Xu YB, Yu JX. Efficient mining of frequent patterns using ascending frequency ordered prefix-tree. Data Mining and Knowledge Discovery (DMKD), 2004,9(3):249-274. [doi: 10.1023/B:DAMI.0000041128.59011.53].
  • 5Han JW, Fu YQ. Discovery of multiple-level association rules from large databases. In: Proc. of the Int'l Conf. on Very Large Data Bases (VLDB). 1995.420-431.
  • 6Srikant R, Agrawal R. Mining generalized association rules. In: Proc. of the Int'l Conf. on Very Large Data Bases (VLDB). 1995. 407-419.
  • 7Hipp J, Myka A, Wirth R, Guntzer U. A new algorithm for faster mining of generalized association rules. In: Proc. of the European Syrup. on Principles of Data Mining and Knowledge Discovery (PKDD). 1998.74-82. [doi: 10.1007/BFb0094807].
  • 8Sriphaew K, Theeramunkong T. Fast algorithm for mining generalized frequent patterns of generalized association rules. IEICE Trans. on Information and Systems (TOIS), 2004,E87-D(3):761-770.
  • 9Pramudiono I, Kitsuregawa M. FP-Tax: Tree structure based generalized association rule mining. In: Proc. of the ACM Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD). 2004. 60-63.
  • 10Savasere A, Omiecinski E, Navathe SB. An efficient algorithm for mining association rules in large databases. In: Proc. of the 21 st Int'l Conf. on Very Large Data Bases (VLDB). 1995. 432-444.











使用帮助 返回顶部