摘要
提出了一种新的实时数据仓库环境下的数据流更新算法——MESHJOIN*算法。算法的特性有:(1)关系R采用了分块和散列的组织形式,尽可能避免对当前连接无效元组的读取,减少连接操作所涉及元组的数量,从而提高连接算法的效率;(2)采用了多线程并发连接技术,并根据工程学原理,实现了连接操作和关系R读取操作的最佳调度,保证了连接算法效率的最大化;(3)根据当前系统的服务率和数据流元组的到达率之间的关系,合理调度实时元组和准实时元组的执行,保证了系统对实时元组的处理要求。实验结果表明,MESHJOIN*算法可以取得比MESHJOIN算法更好的性能。
A new algorithm called MESHJOIN* is proposed to support streaming updates under real-time data warehouse environment.It has the following distinct features:(1) Relation R is organized in blocks and hashes so as to avoid the reading of unusable tuples for the current join operation as much as possible,through which the amount of tuples involved in a join is much reduced,thus enhancing the efficiency of the join operation;(2) Multi-thread parallel execution technology is adopted here,and the order of read operation and join operation is optimized according to engineering theory so as to maximize the efficiency of join algorithm;(3) Reasonable scheduling of real-time tuples and near-real-time tuples is achieved according to the relationship between the current system service rate and the tuples arriving rate,so that the requirement for the processing of real-time tuples is satisfied.Experimental results show that MESHJOIN* can achieve much better performance than MESHJOIN.
出处
《计算机科学与探索》
CSCD
2010年第10期927-939,共13页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.50604012~~
关键词
数据仓库
数据流更新
连接
data warehouse; streaming update; join;