期刊文献+

网格环境下基于分布式数据流频繁模式的数据更新算法

On the data updating algorithm based on distributed data flow under grid environment
下载PDF
导出
摘要 网格环境下,分布式数据源的更新一致过程无法实现.通过计算分布式数据流的频繁项,进行有选择的数据更新清洗,是当前研究的热点.本文提出MDF算法(Mining Distributed Frequent items),计算分布式数据流的频繁项,用以满足诸如更新频繁和查询频繁的数据更新需求.算法采用根节点和节点独立处理的方式,使用简单的位串操作和频繁项副本政策,极大的减小了各节点的计算负载.同时对节点的频繁阈值设置给出了准确的计算公式.用实际数据对算法进行检测.实验结果证明,MDF算法有效的计算分布式数据流频繁项,提高了网格环境下数据更新的效率. Under grid environment, updating all of the distributed data in given time can not be achieved. By calculating the frequent items of distributed data flow, the updating process could be made optionally, which is the hot spot of research on data flow in nowadays. This paper presents MDF algorithm which calculates the frequent items in distributed data flow to meet the demands of updating and inquiring. By using root node and the nodes independently and using a simple string-operation and frequent copy of the policy, the calculation load of nodes has been greatly reduced. At last, the actual data are used to detect the algorithm. The experimental resuits show that MDF algorithm for calculating of double-data-flow frequent items improves the efficiency of data updating in grid environment.
作者 祖悦 党德玉
出处 《吉林化工学院学报》 CAS 2009年第1期54-58,共5页 Journal of Jilin Institute of Chemical Technology
关键词 数据挖掘 网格环境数据更新 双数据流 频繁项 data mining data updating in grid environment distributed data flow frequent items
  • 相关文献

参考文献15

  • 1刘学军,徐宏炳,董逸生,王永利,钱江波.挖掘数据流中的频繁模式[J].计算机研究与发展,2005,42(12):2192-2198. 被引量:25
  • 2Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding Frequent Items in Data Streams [ Z ]. This work was done while the author was at Google Inc.
  • 3于宝琴,王太勇,何改云,李国琴,王双利.一种基于分布式数据仓库的文件集快速更新算法[J].制造业自动化,2005,27(10):13-16. 被引量:1
  • 4周晓丹,冯少荣,薛永生.Oracle 10g RAC核心技术研究与分析[J].计算机工程,2007,33(7):53-55. 被引量:13
  • 5Oracle Enterprise Manager 10g Grid Control [Z]. 2005:2 - 11.
  • 6Amit Manjhi, Vladislav Shkapenyuk Kedar Dhamdhere, Christopher Olston. Finding (Recently) Frequent Items in Distributed Data Streams [Z]. Proceedings of the 21st International Conference on Data Engineering ( ICDE 2005 ). IEEE . 2005:84 -4627/05.
  • 7Hua-Fu Li, Chin-Chuan Ho, Fang-Fei Kuo, et al. A New Algorithm for Maintaining Closed Frequent Itemsets in Data Streams by Incremental Updates[ Z]. Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06) ,2006.
  • 8王伟平,李建中,张冬冬,郭龙江.一种有效的挖掘数据流近似频繁项算法[J].软件学报,2007,18(4):884-892. 被引量:33
  • 9Charikar M, Chen K, Farach-Cohon M. Finding frequent items in data streams [A]. In : Widmayer P, Ruiz FT, Bueno RM, Hennessy M, Eidenbenz S, Conejo R,eds. Proc. of the Int'l Colloquium on Automata, Languages and Programming [ C ]. Malaga : Springer-Verlag,2002 : 693 - 703.
  • 10Cormode G,Muthukrishnan S. What' s hot and what' s not : Tracking most frequent items dynamically [A]. In : Halevy AY,Ives ZG, Doan AH,eds. Proc. of the 22nd ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems [ C ]. San Diego: ACM Press, 2003 : 296 - 306.

二级参考文献38

  • 1Xiu-LiMa,Yun-HaiTong,Shi-WeiTang,Dong-QingYang.Efficient Incremental Maintenance of Frequent Patterns with FP-Tree[J].Journal of Computer Science & Technology,2004,19(6):876-884. 被引量:9
  • 2BOYER R S, MOORE J S. A fast string searching algorithm[J]. Comm. ACM20(10) 1977: 762-772.
  • 3WU S, UDI M. A fast algorithm for multi-pattern searching[R].The Computer Science Department, The University of Arizona, 1994.
  • 4SUN K, YANGGON K. A fast multiple string-pattern matching algorithm[A], Proceedings of the 17 AoM/IAoM International Comference on Computer Science, May 1999.
  • 5JI AWEI H, MICHELINE K. Data mining concepts and techniques [M].China Machine Press, 2001:44-46.
  • 6INMON W H.Building, data warehouse[M].Second Edition,John Wiley, 1996.
  • 7C. Giannella, J. Han, J. Pei, et al. Mining frequent patterns in data streams at multiple time granularities. In: H. Kargupta, A.Joshi, K. Sivakumar, eds. Next Generation Data Mining.Cambridge, Massachusetts: MIT Press, 2003. 191-212.
  • 8G.S. Manku, R. Motwani. Approximate frequency counts over streaming data. The 28th Int'l Conf. Very Large Data Bases(VLDB 2002), Hong Kong, 2002.
  • 9宋国杰 王腾蛟 唐世渭.数据流中频繁模式的评估与维护[A]..第20届全国数据库学术会议[C].长沙,2003..
  • 10R.M. Karp, C. H. Papadimitriou, S. Shenker. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Systems, 2003, 28 (1): 51 - 55.

共引文献67

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部