网格环境下基于分布式数据流频繁模式的数据更新算法

On the data updating algorithm based on distributed data flow under grid environment

下载PDF

导出

摘要网格环境下,分布式数据源的更新一致过程无法实现.通过计算分布式数据流的频繁项,进行有选择的数据更新清洗,是当前研究的热点.本文提出MDF算法(Mining Distributed Frequent items),计算分布式数据流的频繁项,用以满足诸如更新频繁和查询频繁的数据更新需求.算法采用根节点和节点独立处理的方式,使用简单的位串操作和频繁项副本政策,极大的减小了各节点的计算负载.同时对节点的频繁阈值设置给出了准确的计算公式.用实际数据对算法进行检测.实验结果证明,MDF算法有效的计算分布式数据流频繁项,提高了网格环境下数据更新的效率. Under grid environment, updating all of the distributed data in given time can not be achieved. By calculating the frequent items of distributed data flow, the updating process could be made optionally, which is the hot spot of research on data flow in nowadays. This paper presents MDF algorithm which calculates the frequent items in distributed data flow to meet the demands of updating and inquiring. By using root node and the nodes independently and using a simple string-operation and frequent copy of the policy, the calculation load of nodes has been greatly reduced. At last, the actual data are used to detect the algorithm. The experimental resuits show that MDF algorithm for calculating of double-data-flow frequent items improves the efficiency of data updating in grid environment.

作者祖悦党德玉

机构地区吉林化工学院人事处东北电力大学信息工程学院

出处《吉林化工学院学报》 CAS 2009年第1期54-58,共5页 Journal of Jilin Institute of Chemical Technology

关键词数据挖掘网格环境数据更新双数据流频繁项 data mining data updating in grid environment distributed data flow frequent items

分类号 TP319 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献15

1刘学军,徐宏炳,董逸生,王永利,钱江波.挖掘数据流中的频繁模式[J].计算机研究与发展,2005,42(12):2192-2198. 被引量：25
2Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding Frequent Items in Data Streams [ Z ]. This work was done while the author was at Google Inc.
3于宝琴,王太勇,何改云,李国琴,王双利.一种基于分布式数据仓库的文件集快速更新算法[J].制造业自动化,2005,27(10):13-16. 被引量：1
4周晓丹,冯少荣,薛永生.Oracle 10g RAC核心技术研究与分析[J].计算机工程,2007,33(7):53-55. 被引量：13
5Oracle Enterprise Manager 10g Grid Control [Z]. 2005:2 - 11.
6Amit Manjhi, Vladislav Shkapenyuk Kedar Dhamdhere, Christopher Olston. Finding (Recently) Frequent Items in Distributed Data Streams [Z]. Proceedings of the 21st International Conference on Data Engineering ( ICDE 2005 ). IEEE . 2005:84 -4627/05.
7Hua-Fu Li, Chin-Chuan Ho, Fang-Fei Kuo, et al. A New Algorithm for Maintaining Closed Frequent Itemsets in Data Streams by Incremental Updates[ Z]. Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06) ,2006.
8王伟平,李建中,张冬冬,郭龙江.一种有效的挖掘数据流近似频繁项算法[J].软件学报,2007,18(4):884-892. 被引量：33
9Charikar M, Chen K, Farach-Cohon M. Finding frequent items in data streams [A]. In : Widmayer P, Ruiz FT, Bueno RM, Hennessy M, Eidenbenz S, Conejo R,eds. Proc. of the Int'l Colloquium on Automata, Languages and Programming [ C ]. Malaga : Springer-Verlag,2002 : 693 - 703.
10Cormode G,Muthukrishnan S. What' s hot and what' s not : Tracking most frequent items dynamically [A]. In : Halevy AY,Ives ZG, Doan AH,eds. Proc. of the 22nd ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems [ C ]. San Diego: ACM Press, 2003 : 296 - 306.

二级参考文献38

1Xiu-LiMa,Yun-HaiTong,Shi-WeiTang,Dong-QingYang.Efficient Incremental Maintenance of Frequent Patterns with FP-Tree[J].Journal of Computer Science & Technology,2004,19(6):876-884. 被引量：9
2BOYER R S, MOORE J S. A fast string searching algorithm[J]. Comm. ACM20(10) 1977: 762-772.
3WU S, UDI M. A fast algorithm for multi-pattern searching[R].The Computer Science Department, The University of Arizona, 1994.
4SUN K, YANGGON K. A fast multiple string-pattern matching algorithm[A], Proceedings of the 17 AoM/IAoM International Comference on Computer Science, May 1999.
5JI AWEI H, MICHELINE K. Data mining concepts and techniques [M].China Machine Press, 2001:44-46.
6INMON W H.Building, data warehouse[M].Second Edition,John Wiley, 1996.
7C. Giannella, J. Han, J. Pei, et al. Mining frequent patterns in data streams at multiple time granularities. In: H. Kargupta, A.Joshi, K. Sivakumar, eds. Next Generation Data Mining.Cambridge, Massachusetts: MIT Press, 2003. 191-212.
8G.S. Manku, R. Motwani. Approximate frequency counts over streaming data. The 28th Int'l Conf. Very Large Data Bases(VLDB 2002), Hong Kong, 2002.
9宋国杰王腾蛟唐世渭.数据流中频繁模式的评估与维护[A]..第20届全国数据库学术会议[C].长沙,2003..
10R.M. Karp, C. H. Papadimitriou, S. Shenker. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Systems, 2003, 28 (1): 51 - 55.

共引文献67

1邝祝芳,阳国贵,辛动军.SWFPM:一种有效的数据流频繁项挖掘算法[J].计算机应用研究,2009,26(2):466-469. 被引量：4
2符开桂.Oracle 10g RAC技术在邮政储蓄中的运用[J].电信技术,2012(S1):31-34.
3张玉,方滨兴,张永铮.高速网络监控中大流量对象的识别[J].中国科学：信息科学,2010,40(2):340-355. 被引量：11
4高宏宾,张小彬,杨海振.一种实时挖掘数据流近似频繁项的算法[J].计算机应用,2008,28(S2):219-222. 被引量：2
5史金成,胡学钢.数据流挖掘研究[J].计算机技术与发展,2007,17(11):11-14. 被引量：6
6程转流,王本年.数据流中的频繁模式挖掘[J].计算机技术与发展,2007,17(12):53-55. 被引量：5
7孙莉.数据库和数据流频繁项集挖掘算法研究[J].现代机械,2007(5):54-57.
8李建中,高宏.无线传感器网络的研究进展[J].计算机研究与发展,2008,45(1):1-15. 被引量：439
9庄波,刘希玉.数据流中频繁模式挖掘算法研究及进展[J].福建电脑,2008,24(3):8-8.
10敖富江,颜跃进,黄健,黄柯棣.数据流频繁模式挖掘算法设计[J].计算机科学,2008,35(3):1-5. 被引量：11

1熊莺,赫江华.基于WWW的信息检索系统[J].交通与计算机,2000,18(2):45-49. 被引量：1
2尹志武,黄上腾.Finding Recently Frequent Items over Online Data Streams[J].Journal of Donghua University(English Edition),2006,23(6):53-56.
3尹景春,赵洪涛,马辉.巧用触摸屏盘活PLC[J].自动化技术与应用,2009,28(4):88-89. 被引量：1
4张丹华.基于Web Service的WebGIS设计与实现[J].陕西理工学院学报（自然科学版）,2011,27(3):34-39. 被引量：2
5梁世红.面向下一代资源管理系统的动态配置解决思路[J].电信技术,2008(2):83-86. 被引量：2
6张卫钢,陈文斌.利用VC++实现接收数据到曲线的转换及其动态显示[J].工业控制计算机,2003,16(10):52-53. 被引量：9
7王军,李邦祥,曾鹏,于海斌.一个低开销的无线传感器网络多径路由协议[J].计算机应用,2007,27(4):901-904. 被引量：1
8朱颖,韦蓉,魏宇欣,武穆清.一种基于MPLS的Ad hoc无线移动网络解决方案[J].高技术通讯,2007,17(10):1019-1023.
9孙磊,葛临东.一种节点独立的MANET网络多径路由协议[J].计算机工程与应用,2005,41(3):159-161. 被引量：11
10杨云,徐永红,曹立鑫,刘凤玉.一种面向Agent的分布式路由算法[J].计算机工程与应用,2003,39(12):20-22. 被引量：1

吉林化工学院学报

2009年第1期

浏览历史

内容加载中请稍等...

网格环境下基于分布式数据流频繁模式的数据更新算法

参考文献15

二级参考文献38

共引文献67

相关作者

相关机构

相关主题

浏览历史