并行OLTP系统中增量数据的自动分片技术研究被引量：4

Research on Automatic Partitioning of Appended Data in Parallel OLTP Systems

下载PDF

导出

摘要近年来,由于数据规模的急剧增长,越来越多的大型应用系统被部署到分布式环境中,它们需要通过数据分片技术,将原有数据集和新增加的数据审慎地划分到不同的节点上,来优化并行联机事务处理(on-line transaction processing,OLTP)系统的性能。针对系统中已有的静态数据和新生成的增量数据,提出了一种新的数据分片策略——数据表依赖分片策略(table dependency partitioning strategy,TDPS)。该策略首先根据数据表之间的相互依赖关系,对初始数据进行划分。当有新的数据到达时,它会自动将每个数据片段分配到最相关的数据分区中。使用TPC-C测试基准进行了一系列的实验,实验结果显示,与以前的方法相比,TDPS策略可以有效地提高系统性能。 Nowadays, more and more applications have to be deployed in a distributed environment in order to handle huge volume of data, which need to use data partitioning to optimize the performance of parallel OLTP （on-line transaction processing） systems via carefully dividing the original data and newly appended data into different data nodes. This paper presents a novel data partitioning strategy for allocating both static and appended data, called TDPS （table dependency partitioning strategy）. This strategy firstly partitions the initial data based on table dependency. When there are new data arriving, it will assign each data fragment to the partition most close to it. This paper conducts a series of experiments over TPC-C datasets and transactions. According to the results, the proposed strategy can effectively improve the system performance compared with previous methods.

作者王晓燕陈晋川杜小勇范旭

机构地区中国人民大学信息学院鲁东大学信息与电气工程学院教育部数据工程与知识工程重点实验室

出处《计算机科学与探索》 CSCD 2013年第9期800-810,共11页 Journal of Frontiers of Computer Science and Technology

基金国家自然科学基金No.61003086 软件开发环境国家重点实验室开放基金No.SKLSDE-2012KF-09 中国人民大学研究生科研基金No.42306176~~

关键词数据划分联机事务处理(OLTP) 增量数据 data partitioning on-line transaction processing （OLTP） appended data

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献17

1Curino C, Jones E, Zhang Yang, et al. Schism: a workloaddriven approach to database replication and partitioning[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 48-57.
2Pavlo A, Curino C, Zdonik S. Skew-aware automatic database partitioning in shared-nothing parallel OLTP systems[C]// Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). New York, NY, USA: ACM, 2012: 61-72.
3Jindal A, Dittrich J. Relax and let the database do the partitioning online[C]/lProceedings of the 5th International Workshop on Enabling Real-Time Business Intelligence (BIRTE ' 11), Seattle, USA, Sep 2, 2011. Berlin, Heidelberg: Springer-Verlag, 2011: 65-80.
4Liroz-Gistau M, Akbarinia R, Pacitti E, et al. Dynamic workload-based partitioning for large-scale databases[C]//LNCS 7447: Proceedings of the 23rd International Conference on Database and Expert Systems Applications (DEXA ' 12), Vienna, Austria, Sep 3-6, 2012. Berlin, Heidelberg: SpringerVerlag, 2012: 183-190.
5Wang Xiaoyan, Chen Jinchuan, Du Xiaoyong. ASAWA: an automatic partition key selection strategy[C]//LNCS 7808: Proceedings of the 15th Asia-Pacific Web Conference (APWeb 2013), Sydney, Australia, Apr 4-6,2013. Berlin, Heidelberg: Springer-Verlag, 2013: 609-620.
6Coomans D, Massart D. Alternative k-nearest neighbor rules in supervised pattern recognition: Part 1 k-nearest neighbor classification by using alternative voting rules[J]. Analytica ChimicaActa, 1982, 136: 15-27.
7McCormick W, Schweitzer P, White T. Problem decomposition and data reorganization by a clustering technique[J]. Operations Research, 1972,20(5): 993-1009.
8TPC Benchmark'" C[EB/OL]. (2012)[2013-03-23]. http:// www.tpc.org/tpcc/.
9Buyya R, Yeo C, Venugopal S, et al. Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility[J]. Future Generation Computer Systems, 2009, 25(6): 599-616.
10Ceri S, Negri M, Pelagatti G. Horizontal data partitioning in database design[C]//Proceedings of the 1982 ACM SIGMOD International Conference on Management of Data (SIGMOD '82). New York, NY, USA: ACM,1982: 128-136.

同被引文献113

1James K G, Evelson B, Karel R. In-database analytics , The heart of the predictive enterprise. Forrester Whitepaper, USA: Forrester Research, 2009.
2Brewer E. Towards robust distributed systems/ /Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing. Portland, USA, 2004: 7.
3Brewer E. CAP twelve years later: How the "rules" have changed. Computer, 2012, 45(2): 23-29.
4DeanJ, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 510): 107-113.
5White T. Hadoop , The Definitive Guide. USA: Yahool Press, 2010.
6Isard M, Budiu M, Yu y, et al. Dryad: Distributed data?parallel programs from sequential building blocks/ /Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems. Lisbon, Portugal, 2007: 59-72.
7Olston B, Reed U, Srivastava R, et al. Pig latin: A not-so?foreign language for data processing/ /Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. Vancouver, Canada, 2008: 1099-1110.
8Capriolo E, Wampler D, RutherglenJ. Programming Hive. USA: O'Reilly Media, 2012.
9Yu v , Isard M, Fetterly D, et al. DryadUNQ: A system for general-purpose distributed data-parallel computing using a high-level languagel /Proceedings of the 8th USE NIX Conference on Operating Systems Design and Implementation. Berkeley, USA, 2008: 1-14.
10DeCandia G, Hastorun D, Iarnpani M, et al. Dynamo: Amazon's highly available key-value store/ /Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles. Stevenson, USA, 2007: 205-220.

引证文献4

1王晓燕,陈晋川,郭小燕,杜小勇.基于Nash-Pareto策略的自动数据分布方法及支持工具[J].计算机研究与发展,2015,52(9):1965-1975. 被引量：2
2王晓燕,陈晋川,杜小勇.云计算环境中面向OLTP应用的数据分布研究[J].计算机学报,2016,39(2):253-269. 被引量：9
3熊峰,刘宇.基于MongoDB的数据分片与分配策略研究[J].计算机与数字工程,2019,47(4):892-897. 被引量：13
4梁伟晟.基于Hadoop平台的结算数据切片方法及实现[J].现代计算机,2021,27(29):118-120.

二级引证文献24

1叶思斯,林志达,郭献彬,曹小明.基于MongoDB的配置管理平台应用研究[J].系统仿真技术,2021,17(4):253-258. 被引量：5
2王晓燕,陈晋川,杜小勇.云计算环境中面向OLTP应用的数据分布研究[J].计算机学报,2016,39(2):253-269. 被引量：9
3亢华爱.面向机器学习的通信网络大数据相关性分析算法研究[J].激光杂志,2016,37(8):145-148. 被引量：4
4刘佳伟.云计算技术的应用与发展[J].电子技术与软件工程,2016(22):149-149. 被引量：1
5杨学林.云计算环境下三维海量激光扫描数据的分布存储技术研究[J].激光杂志,2017,38(7):171-175. 被引量：3
6袁磊,许劼,许广州.数据分析在汽车工业设备智能分析系统的应用[J].计算机应用与软件,2017,34(12):154-157. 被引量：1
7张伟,马利民,智昊.面向商品筛选应用的大数据处理优化技术[J].北京信息科技大学学报（自然科学版）,2018,33(4):1-9.
8耿晓中.云计算环境中数据分布式强制访问控制算法[J].科学技术与工程,2017,17(29):114-119. 被引量：2
9康宏,郭蒙雨,袁晓洁.应用驱动的基于流式框架的实时数据分区算法[J].计算机应用研究,2018,35(4):1135-1141. 被引量：1
10杨浩.基于MongoDB与Hadoop MapReduce的数据分析系统性能改进研究[J].微型电脑应用,2019,35(11):61-64. 被引量：4

1蒋本天,贺楠,邢恺,史国良.基于RNN的数据分片技术的研究[J].佳木斯教育学院学报,2011(4):359-359.
2周炜.云环境下提升MongoDB自动分片性能研究[J].科技创新导报,2013,10(29):22-23. 被引量：1
3赵君,张春海,李华.基于XML中间件的分布式数据库的数据分片策略[J].计算机工程与设计,2006,27(3):466-468. 被引量：11
4邓志飞,应良佳,王军威.基于IODA算法MongoDB负载均衡的改进[J].现代电信科技,2013,43(7):9-13. 被引量：4
5魏新红,张凯,孟哲.数据分片技术在数据访问控制中的应用[J].福建电脑,2006,22(8):100-101. 被引量：2
6牛倩.MongoDB数据库中自动分片技术应用研究[J].数字技术与应用,2016,34(6):112-112. 被引量：2
7康婧.基于RNN的数据分片技术的研究[J].数字技术与应用,2011,29(7):53-53. 被引量：1
8何杭锋.基于FODO算法MongoDB自动分片的改进[J].计算机技术与发展,2013,23(7):127-130. 被引量：9
9张洪,路松峰,赵友桥,胥永康,胡和平.数据安全存储的分片策略模型研究[J].计算机工程与应用,2012,48(18):66-70. 被引量：3
10陈俊,郑国磊,罗智波.面向数控加工的面等值自动分片方法[J].计算机辅助设计与图形学学报,2015,27(5):924-929. 被引量：2

计算机科学与探索

2013年第9期

浏览历史

内容加载中请稍等...

并行OLTP系统中增量数据的自动分片技术研究被引量：4

参考文献17

同被引文献113

引证文献4

二级引证文献24

相关作者

相关机构

相关主题

浏览历史

并行OLTP系统中增量数据的自动分片技术研究 被引量：4

参考文献17

同被引文献113

引证文献4

二级引证文献24

相关作者

相关机构

相关主题

浏览历史

并行OLTP系统中增量数据的自动分片技术研究被引量：4