期刊文献+

MESHJOIN*:实时数据仓库环境下的数据流更新算法 被引量:5

MESHJOIN*:An Algorithm Supporting Streaming Updates in a Real-time Data Warehouse
下载PDF
导出
摘要 提出了一种新的实时数据仓库环境下的数据流更新算法——MESHJOIN*算法。算法的特性有:(1)关系R采用了分块和散列的组织形式,尽可能避免对当前连接无效元组的读取,减少连接操作所涉及元组的数量,从而提高连接算法的效率;(2)采用了多线程并发连接技术,并根据工程学原理,实现了连接操作和关系R读取操作的最佳调度,保证了连接算法效率的最大化;(3)根据当前系统的服务率和数据流元组的到达率之间的关系,合理调度实时元组和准实时元组的执行,保证了系统对实时元组的处理要求。实验结果表明,MESHJOIN*算法可以取得比MESHJOIN算法更好的性能。 A new algorithm called MESHJOIN* is proposed to support streaming updates under real-time data warehouse environment.It has the following distinct features:(1) Relation R is organized in blocks and hashes so as to avoid the reading of unusable tuples for the current join operation as much as possible,through which the amount of tuples involved in a join is much reduced,thus enhancing the efficiency of the join operation;(2) Multi-thread parallel execution technology is adopted here,and the order of read operation and join operation is optimized according to engineering theory so as to maximize the efficiency of join algorithm;(3) Reasonable scheduling of real-time tuples and near-real-time tuples is achieved according to the relationship between the current system service rate and the tuples arriving rate,so that the requirement for the processing of real-time tuples is satisfied.Experimental results show that MESHJOIN* can achieve much better performance than MESHJOIN.
出处 《计算机科学与探索》 CSCD 2010年第10期927-939,共13页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金No.50604012~~
关键词 数据仓库 数据流更新 连接 data warehouse; streaming update; join;
  • 相关文献

参考文献1

二级参考文献2

共引文献11

同被引文献41

  • 1林子雨,杨冬青,宋国杰,王腾蛟.实时主动数据仓库中的变化数据捕捉研究综述[J].计算机研究与发展,2007,44(z3):447-451. 被引量:7
  • 2Karakasidis A,Vassiliadis P,Pitoura E.ETL Queues for Active Data Warehousing[C] //Proc.of the 2nd International Workshop on Information Quality in Information Systems.New York,USA:ACM Press,2005:28-39.
  • 3Golab L,Johnson T,Seidel J S,et al.Stream Warehousing with Data Depot[C] //Proc.of the 35th SIGMOD International Conference on Management of Data.New York,USA:ACM Press,2009:847-854.
  • 4Polyzotis N,Skiadopoulos S,Vassiliadis P,et al.Supporting Streaming Updates in an Active Data Warehouse[C] //Proc.of the 23rd International Conference on Data Engineering.New Jersey,USA:IEEE Computer Society,2007:476-485.
  • 5Polyzotis N,Skiadopoulos S,Vassiliadis P,et al.Meshing Streaming Updates with Persistent Data in an Active Data Warehouse[J].IEEE Transactions on Knowledge and Data Engineering,2008,20(7):976-991.
  • 6Naeem M,Dobbie G,Weber G.R-MESHJOIN for Nearreal-time Data Warehousing[C] //Proc.of the 13th International Workshop on Data Warehousing and OLAP.New York,USA:ACM Press,2010:53-60.
  • 7Chakraborty A,Singh A.A Partition-based Approach to Support Streaming Up-dates over Persistent Data in an Active Data Warehouse[C] //Proc.of 2009 IEEE International Symposium on Parallel&Distributed Processing.Washington D.C.,USA:IEEE Computer Society,2009:1-11.
  • 8Chandrasekaran S,Franklin M J.PSoup:A System for Streaming Queries over Streaming Data[J].The VLDB Journal,2003,12(2):140-156.
  • 9Tao Yufei,Yiu M,Papadias D.Producing Fast Join Results on Streams Through Rate-based Optimization[C] //Proc.of ACM SIGMOD International Conference on Management of Data.New York,USA:ACM Press,2005:371-382.
  • 10Erik B,Hu Yu,Duncan S.Goodbye Pareto Principle,Hello Long Tail:The Effect of Search Costs on the Concentration of Product Sales[J].Management Science,2011,57(8):1373-1386.

引证文献5

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部