期刊文献+

基于SN结构的事件流并行数据库加载均衡策略研究 被引量:1

Study of Loading Strategy in Shared-Nothing Event Stream Parallel Database Systems
下载PDF
导出
摘要 倾斜问题是并行系统普遍存在的问题,对系统的性能影响很大.事件流数据库作为数据流应用的后端分析处理系统具有连续大量的事件流加载与用户查询并存的特点,传统的解决数据倾斜的方法无法适应其动态加载的特点.以主干网的网络安全监控应用为研究背景,结合事件流负载特征,针对基于无共享结构的事件流并行数据库提出了一种基于周期计数的能力感知加载均衡策略.该方法在保证加载性能的同时,可以根据加载节点的能力在线自动调解数据分布,不仅有效预防和解决了系统倾斜,还为查询服务的性能奠定了良好的基础.模拟分析和真实测试都证明这种加载均衡策略较其他策略更有效. Skew is one of the most important problems in parallel systems, which has a great impact on the parallel systems performance. The event stream system is the back-end data processing and analysis systems of data stream management systems (DSMS). It is different from the traditional database systems due to the new workload characterization. This kind of systems receive continuous, fast-coming and large volume of event stream data on one side, and supply quick response to the users' queries on the other side. Under such a condition the common data redistribution solutions to data skew are not suitable any longer. In this paper a periodical counting based capability aware (PCCA) loading strategy is presented based on DBroker, which is a shared nothing event stream parallel database System for the backbone network monitoring application. This loading strategy not only keeps the event stream data being loaded fast and correctly, but also recognizes and prevents the system from the skew automatically, according to the loading capability of each node adaptively. What's more, it forms a good data distribution foundation for query service. Finally PCCA loading strategy is proven to provide much better performance than the other three methods in both simulation model analysis and real system testing.
出处 《计算机研究与发展》 EI CSCD 北大核心 2009年第1期159-166,共8页 Journal of Computer Research and Development
基金 国家“八六三”高技术研究发展计划基金项目(2006AA01A102)~~
关键词 倾斜 事件流 并行数据库 负载均衡 加载策略 skew event stream parallel database load balancing loading strategy
  • 相关文献

参考文献13

  • 1Carney D, Cetintemel U, Cherniack M, et al. Monitoring streams: A new class of data management applications [C] // Proc of the 28th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2002:215-226
  • 2Cranor C, Johnson T, Spataschek O, et al. Gigascope: A stream database for network applications [C] //Proc of the 22nd ACM SIGMOD Conf on Management of Data. New York: ACM, 2003:647-651
  • 3Sullivan M, Heybey A. Tribeca: A system for managing large databases of network traffic [C] //Proc of the USENIX Annual Technical Conference. Berkeley, CA: USENIX Association, 1998:2-12
  • 4Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems[C] //Proc of the 21st ACM SIGMOD/ PODS. New York: ACM, 2002:1-16
  • 5DeWitt D J, Gray J. Parallel database systems: The future of database processing or a passing Fad [J]. ACM SIGMOD Record, 1990, 19(4): 104-112
  • 6Boral H, Alexander W, Clay L, et al. Prototyping Bubba, a highly parallel database system [J]. IEEE Trans on Knowledge and Data Engineering, 1990, 2( 1 ) : 4-24
  • 7Walton C B, Dale A G, Jenevein R M. A taxonomy and performance model of data skew effects in parallel joins [C]// Proc of the 7th Int Conf on Very I.arge Data Bases. San Francisco, CA: Morgan Kaufmann, 1991: 537-548
  • 8Copeland G P, Alexander W, Boughter E E, et al. Data placement in Bubba [J]. ACM SIGMOD Record, 1988, 17 (3): 99-108
  • 9Rahm E, Marek R. Analysis of dynamic load balancing strategies for parallel shared nothing database systems [C] // Proc of the 19th Int Conf on Very Large Data Bases. San Francisco, CA: MorganKaufmann, 1993:182-193
  • 10Wang J, Tsutaya Y, Segawa N, et al. Approaches to balancing data load of shared-nothing clusters and their performance comparison [C] //Proc of the 9th Int Conf on Parallel and Distributed Systems. Los Alamitos, CA: IEEE Computer Society, 2002:293-301

同被引文献11

  • 1Garofalakis M N, Gibbons P B, Approximate query processing: Taming the TeraBytes! A Tutorial [C] //Proc of the 27th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2001.
  • 2Hellerstein J M, Haas P J, Wang H J. Online aggregation [C] //Proc of the 16th ACM SIGMOD Conf on Management of Data. New York: ACM, 1997: 171-182.
  • 3Haas P J, Hellerstein J M. Ripple joins for online aggregation [C] //Proc of the 18th ACM SIGMOD Conf on Management of Data. New York: ACM, 1999:287-298.
  • 4Dittrich J P, Seeger B, Taylor D S, et al. Progressive merge join: A generic and non-blocking sort based join algorithm [C]//Proc of the 28th lnt Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2002:299- 310.
  • 5Luo G, Ellmann C J, Haas P J, et al. A scalable hash ripple join algorithm [C] //Proc of the 21st ACM SIGMOD Conf on Management of Data. New York: ACM, 2002:252-262.
  • 6Dittrich J P, Sccger B, Taylor D S, et al. On producing join results early[C]//Procofthe22th ACM/PODS. New York: ACM, 2003:134-142.
  • 7Jermaine C, Dobra A, Arumugam S, et al A disk based join with probabilistic guarantees[C] //Proc of the 24th ACM SIGMOD Conf on Management of Data. New York: ACM, 2005, 563-574.
  • 8Jermaine C, Arumugam S, Pol A, et al. Scalable approximate query processing with the DBO engine [C] // Proc of the 26th ACM SIGMOD Conf on Management of Data. New York: ACM, 2007:725-736.
  • 9Raman V, Raman B, Hellerstein J M. Online dynamic reordering for interactive data processing [C] //Proc of the 25th Int Conf on Very Large Data Bases. San Francisco, CA:Morgan Kaufmann, 1999: 709-720.
  • 10Jermaine C, Dobra A, Pol A, et al. Online estimation for subset-based SQL queries [C] //Proc of the 31th Int Conf on Very Large Data Bases. New York: ACM, 2005: 745-756.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部