期刊文献+

分布式全局频繁项目集的快速挖掘方法 被引量:11

Fast Mining Algorithm for Distributed Global Frequent Itemsets
下载PDF
导出
摘要 针对传统的分布式全局频繁项目集挖掘算法存在大量的候选项目集,且求全局频繁项目集的网络通信代价过高等问题,提出了一种分布式数据库的全局频繁项目集快速挖掘算法(FDMA).该算法改进了频繁模式树(FP-树)的结构,将双向FP-树改为单向,每个节点只保留指向父结点的指针,减少了指针数,由此可节省1/3的树空间;同时通过传送用3个很小的数组表示的被约束子树,在此挖掘全局频繁项目集的过程中不再生成大量候选项目集或条件FP-树,从而减小了网络通信量,提高了挖掘效率.实验表明,所提算法的挖掘速度比传统的分布式数据库数据挖掘算法至少提高了1倍之多,随着数据库规模的增大,它的扩展性将更好. Aiming at the problem that there exists prolific candidate sets and the communication cost is high for obtaining global frequent itemsets in distributed mining algorithms, an algorithm FDMA(fast distributed mining algorithm) for mining global frequent itemsets in distributed database is proposed. It improves the structure of frequent pattern tree (FP-tree) and the bidirectional FP-tree is transformed into unilateral one and only the pointer that points to the parent node is reserved at each node. The number of the pointer is decreased and one third of tree space can be saved. Meanwhile,by transmitting constrained sub-trees (consisting of three small arrays), the proposed algorithm doesn't generate conditional FP-tree or a large number of candidate sets in mining process, and thus the network traffic is reduced and the mining efficiency is increased. Experiments show that in comparison with DDDM (distributed dual decision miner) algorithm, the proposed algorithm increases the mining speed by at least two times. Moreover, the algorithm has better extensibility with increasing the scale of database.
作者 宋宝莉 覃征
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2006年第8期923-927,共5页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(60542004)
关键词 数据挖掘 分布式数据库 全局频繁项目集 被约束子树 data mining distributed database global frequent itemset constrained sub-tree
  • 相关文献

参考文献7

  • 1Agrawal R,Imielinski T,Swami A.Mining association rules between sets of items in large database[J].SIGMOD Record,1993,22(2):207-216.
  • 2Agrawal R,Srikant R.Fast algorithms for mining association rules in large databases[C]∥Proceedings of the 20th International Conference on Very Large Data Bases.San Mateo,USA:Morgan Kaufmann Publicationg Inc.,1994:487.
  • 3Agrawal R,Srikant R.Parallel mining of association rules[J].IEEE Trans on Knowledge and Data Engineering,1996,8(6):962-969.
  • 4Cheung D W,Han Jiawei,Ng V T,et al.A fast distributed algorithm for mining association rules[C]∥Proceedings of the 1996 4th International Conference on Parallel and Distributed Information Systems.Los Alamitos,USA:IEEE Computer Society,1996:31-42.
  • 5Schuster A,Wolff R.Communication efficient distributed mining of association rules[C]∥Proceedings of the ACM SIGMOD International Conference on Management of Data.New York:ACM,2001:473-484.
  • 6Han Jiawei,Pei Jian,Yin Yiwen.Mining frequent patterns without candidate generation[C]∥Proceedings of the ACM SIGMOD International Conference on Management of Data.New York:ACM,2000:1-12.
  • 7范明,李川.在FP-树中挖掘频繁模式而不生成条件FP-树[J].计算机研究与发展,2003,40(8):1216-1222. 被引量:56

二级参考文献8

  • 1R Agrawal, R Srikant. Fast algorithms for mining association rules. In: Proc of 1994 Int'l Conf on Very Large Data Bases.Santiago, Chili: VLDB Endowment, 1994. 487--499.
  • 2J S Park, M S Chen, P S Yu. An effective Hash-based algorithm for mining association rules. In: Proc of 1995 ACM-SIGMOD Int'l Cord on Management of Data. San Jose, CA: ACM Press,1995. 175--186.
  • 3S Brin, R Motwani, C Silvemtein. Beyond market basket:Generalizing association rules to correlations. In: Proe of 1997 ACM-SIGMOD Int'l Conf on Management of Data. Tucson, AZ:ACM Press, 1997. 265--276.
  • 4R Agrawal, R Srikant. Mining sequential patterns. In: ICDE'95. Taipei, Taiwan: IEEE Computer Society Press, 1995. 3--14.
  • 5G Dong, J Li. Efficient mining of emerging patterns: Discovering trends and differences. In: Proc of the 5th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining. San Diego, CA:ACM Press, 1999. 43~52.
  • 6J Han, J Pei, Y Yin. Mining frequent patterns without candidate generation. In: Proe of 2000 ACM-SIGMOD Int'l Conf on Management of Data. Dallas, TX: ACM Press, 2000. 1--12.
  • 7Artur Bykowski, Christophe Rigotti. A eondemsed representation to find frequent patterns. In: Proe of the 20th ACM SIGACT-SIGMOD-SIGART Symp on Principles of Database Systems(PODS 2001). Santa Barbara, CA: ACM Press, 2001. 267~273.
  • 8范明 等.数据挖掘:概念与技术[M].北京:机械工业出版社,2001.8.

共引文献55

同被引文献74

引证文献11

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部