基于SN结构的事件流并行数据库加载均衡策略研究被引量：1

Study of Loading Strategy in Shared-Nothing Event Stream Parallel Database Systems

下载PDF

导出

摘要倾斜问题是并行系统普遍存在的问题,对系统的性能影响很大.事件流数据库作为数据流应用的后端分析处理系统具有连续大量的事件流加载与用户查询并存的特点,传统的解决数据倾斜的方法无法适应其动态加载的特点.以主干网的网络安全监控应用为研究背景,结合事件流负载特征,针对基于无共享结构的事件流并行数据库提出了一种基于周期计数的能力感知加载均衡策略.该方法在保证加载性能的同时,可以根据加载节点的能力在线自动调解数据分布,不仅有效预防和解决了系统倾斜,还为查询服务的性能奠定了良好的基础.模拟分析和真实测试都证明这种加载均衡策略较其他策略更有效. Skew is one of the most important problems in parallel systems, which has a great impact on the parallel systems performance. The event stream system is the back-end data processing and analysis systems of data stream management systems （DSMS）. It is different from the traditional database systems due to the new workload characterization. This kind of systems receive continuous, fast-coming and large volume of event stream data on one side, and supply quick response to the users＇ queries on the other side. Under such a condition the common data redistribution solutions to data skew are not suitable any longer. In this paper a periodical counting based capability aware （PCCA） loading strategy is presented based on DBroker, which is a shared nothing event stream parallel database System for the backbone network monitoring application. This loading strategy not only keeps the event stream data being loaded fast and correctly, but also recognizes and prevents the system from the skew automatically, according to the loading capability of each node adaptively. What＇s more, it forms a good data distribution foundation for query service. Finally PCCA loading strategy is proven to provide much better performance than the other three methods in both simulation model analysis and real system testing.

作者刘莹王启荣孙凝晖

机构地区中国科学院计算技术研究所中国科学院研究生院 IBM中国软件开发中心

出处《计算机研究与发展》 EI CSCD 北大核心 2009年第1期159-166,共8页 Journal of Computer Research and Development

基金国家“八六三”高技术研究发展计划基金项目(2006AA01A102)~~

关键词倾斜事件流并行数据库负载均衡加载策略 skew event stream parallel database load balancing loading strategy

分类号 TP311.133.2 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献13

1Carney D, Cetintemel U, Cherniack M, et al. Monitoring streams: A new class of data management applications [C] // Proc of the 28th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2002:215-226
2Cranor C, Johnson T, Spataschek O, et al. Gigascope: A stream database for network applications [C] //Proc of the 22nd ACM SIGMOD Conf on Management of Data. New York: ACM, 2003:647-651
3Sullivan M, Heybey A. Tribeca: A system for managing large databases of network traffic [C] //Proc of the USENIX Annual Technical Conference. Berkeley, CA: USENIX Association, 1998:2-12
4Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems[C] //Proc of the 21st ACM SIGMOD/ PODS. New York: ACM, 2002:1-16
5DeWitt D J, Gray J. Parallel database systems: The future of database processing or a passing Fad [J]. ACM SIGMOD Record, 1990, 19(4): 104-112
6Boral H, Alexander W, Clay L, et al. Prototyping Bubba, a highly parallel database system [J]. IEEE Trans on Knowledge and Data Engineering, 1990, 2( 1 ) : 4-24
7Walton C B, Dale A G, Jenevein R M. A taxonomy and performance model of data skew effects in parallel joins [C]// Proc of the 7th Int Conf on Very I.arge Data Bases. San Francisco, CA: Morgan Kaufmann, 1991: 537-548
8Copeland G P, Alexander W, Boughter E E, et al. Data placement in Bubba [J]. ACM SIGMOD Record, 1988, 17 (3): 99-108
9Rahm E, Marek R. Analysis of dynamic load balancing strategies for parallel shared nothing database systems [C] // Proc of the 19th Int Conf on Very Large Data Bases. San Francisco, CA: MorganKaufmann, 1993:182-193
10Wang J, Tsutaya Y, Segawa N, et al. Approaches to balancing data load of shared-nothing clusters and their performance comparison [C] //Proc of the 9th Int Conf on Parallel and Distributed Systems. Los Alamitos, CA: IEEE Computer Society, 2002:293-301

同被引文献11

1Garofalakis M N, Gibbons P B, Approximate query processing: Taming the TeraBytes! A Tutorial [C] //Proc of the 27th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2001.
2Hellerstein J M, Haas P J, Wang H J. Online aggregation [C] //Proc of the 16th ACM SIGMOD Conf on Management of Data. New York: ACM, 1997: 171-182.
3Haas P J, Hellerstein J M. Ripple joins for online aggregation [C] //Proc of the 18th ACM SIGMOD Conf on Management of Data. New York: ACM, 1999:287-298.
4Dittrich J P, Seeger B, Taylor D S, et al. Progressive merge join: A generic and non-blocking sort based join algorithm [C]//Proc of the 28th lnt Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 2002:299- 310.
5Luo G, Ellmann C J, Haas P J, et al. A scalable hash ripple join algorithm [C] //Proc of the 21st ACM SIGMOD Conf on Management of Data. New York: ACM, 2002:252-262.
6Dittrich J P, Sccger B, Taylor D S, et al. On producing join results early[C]//Procofthe22th ACM/PODS. New York: ACM, 2003:134-142.
7Jermaine C, Dobra A, Arumugam S, et al A disk based join with probabilistic guarantees[C] //Proc of the 24th ACM SIGMOD Conf on Management of Data. New York: ACM, 2005, 563-574.
8Jermaine C, Arumugam S, Pol A, et al. Scalable approximate query processing with the DBO engine [C] // Proc of the 26th ACM SIGMOD Conf on Management of Data. New York: ACM, 2007:725-736.
9Raman V, Raman B, Hellerstein J M. Online dynamic reordering for interactive data processing [C] //Proc of the 25th Int Conf on Very Large Data Bases. San Francisco, CA:Morgan Kaufmann, 1999: 709-720.
10Jermaine C, Dobra A, Pol A, et al. Online estimation for subset-based SQL queries [C] //Proc of the 31th Int Conf on Very Large Data Bases. New York: ACM, 2005: 745-756.

引证文献1

1安明远,孙秀明,孙凝晖.动态分片在线聚集[J].计算机研究与发展,2010,47(11):1928-1935.

1殷贤亮,索涛,卢炎生.基于SN结构的并行实时数据库事务调度算法[J].华中科技大学学报（自然科学版）,2003,31(4):34-35. 被引量：3
2王浩.云存储与网络相互影响的研究[J].计算机工程,2013,39(10):24-30. 被引量：3
3薛继伟,姜波,刘庆强,王征.基于能力感知的人机任务调度算法[J].计算机工程,2009,35(19):88-90. 被引量：3
4崔晓松,胡建萍,王春寒.基于NiosII的串口通信在解算芯片中的应用[J].杭州电子科技大学学报（自然科学版）,2007,27(1):5-8. 被引量：1
5叶常春,杨利,杨树强.基于工作站机群结构的并行数据库连接算法的实现[J].计算机工程与科学,2000,22(5):77-80. 被引量：1
6郭盈,周波,高炽扬.WebGIS平台的一种性能测试框架[J].现代计算机（中旬刊）,2012(8):56-59.
7袁朝辉,徐鹏,朱伟,何长安.基于神经网络的无人机负载模拟器的复合控制[J].计算机仿真,2006,23(3):37-40. 被引量：4
8王萍.计算机网络安全问题及对策研究[J].山东工业技术,2016(22):140-140. 被引量：2
9苏培华.基于云计算的安全分析[J].新技术新工艺,2015(6):100-103.
10尹慧.ARP病毒的原理与防治[J].考试周刊,2011(64):131-133. 被引量：2

计算机研究与发展

2009年第1期

浏览历史

内容加载中请稍等...

基于SN结构的事件流并行数据库加载均衡策略研究被引量：1

参考文献13

同被引文献11

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于SN结构的事件流并行数据库加载均衡策略研究 被引量：1

参考文献13

同被引文献11

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于SN结构的事件流并行数据库加载均衡策略研究被引量：1