摘要
网格环境下,分布式数据源的更新一致过程无法实现.通过计算分布式数据流的频繁项,进行有选择的数据更新清洗,是当前研究的热点.本文提出MDF算法(Mining Distributed Frequent items),计算分布式数据流的频繁项,用以满足诸如更新频繁和查询频繁的数据更新需求.算法采用根节点和节点独立处理的方式,使用简单的位串操作和频繁项副本政策,极大的减小了各节点的计算负载.同时对节点的频繁阈值设置给出了准确的计算公式.用实际数据对算法进行检测.实验结果证明,MDF算法有效的计算分布式数据流频繁项,提高了网格环境下数据更新的效率.
Under grid environment, updating all of the distributed data in given time can not be achieved. By calculating the frequent items of distributed data flow, the updating process could be made optionally, which is the hot spot of research on data flow in nowadays. This paper presents MDF algorithm which calculates the frequent items in distributed data flow to meet the demands of updating and inquiring. By using root node and the nodes independently and using a simple string-operation and frequent copy of the policy, the calculation load of nodes has been greatly reduced. At last, the actual data are used to detect the algorithm. The experimental resuits show that MDF algorithm for calculating of double-data-flow frequent items improves the efficiency of data updating in grid environment.
出处
《吉林化工学院学报》
CAS
2009年第1期54-58,共5页
Journal of Jilin Institute of Chemical Technology
关键词
数据挖掘
网格环境数据更新
双数据流
频繁项
data mining
data updating in grid environment
distributed data flow
frequent items