期刊文献+

并行OLTP系统中增量数据的自动分片技术研究 被引量:4

Research on Automatic Partitioning of Appended Data in Parallel OLTP Systems
下载PDF
导出
摘要 近年来,由于数据规模的急剧增长,越来越多的大型应用系统被部署到分布式环境中,它们需要通过数据分片技术,将原有数据集和新增加的数据审慎地划分到不同的节点上,来优化并行联机事务处理(on-line transaction processing,OLTP)系统的性能。针对系统中已有的静态数据和新生成的增量数据,提出了一种新的数据分片策略——数据表依赖分片策略(table dependency partitioning strategy,TDPS)。该策略首先根据数据表之间的相互依赖关系,对初始数据进行划分。当有新的数据到达时,它会自动将每个数据片段分配到最相关的数据分区中。使用TPC-C测试基准进行了一系列的实验,实验结果显示,与以前的方法相比,TDPS策略可以有效地提高系统性能。 Nowadays, more and more applications have to be deployed in a distributed environment in order to handle huge volume of data, which need to use data partitioning to optimize the performance of parallel OLTP (on-line transaction processing) systems via carefully dividing the original data and newly appended data into different data nodes. This paper presents a novel data partitioning strategy for allocating both static and appended data, called TDPS (table dependency partitioning strategy). This strategy firstly partitions the initial data based on table dependency. When there are new data arriving, it will assign each data fragment to the partition most close to it. This paper conducts a series of experiments over TPC-C datasets and transactions. According to the results, the proposed strategy can effectively improve the system performance compared with previous methods.
出处 《计算机科学与探索》 CSCD 2013年第9期800-810,共11页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金No.61003086 软件开发环境国家重点实验室开放基金No.SKLSDE-2012KF-09 中国人民大学研究生科研基金No.42306176~~
关键词 数据划分 联机事务处理(OLTP) 增量数据 data partitioning on-line transaction processing (OLTP) appended data
  • 相关文献

参考文献17

  • 1Curino C, Jones E, Zhang Yang, et al. Schism: a workloaddriven approach to database replication and partitioning[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 48-57.
  • 2Pavlo A, Curino C, Zdonik S. Skew-aware automatic database partitioning in shared-nothing parallel OLTP systems[C]// Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). New York, NY, USA: ACM, 2012: 61-72.
  • 3Jindal A, Dittrich J. Relax and let the database do the partitioning online[C]/lProceedings of the 5th International Workshop on Enabling Real-Time Business Intelligence (BIRTE ' 11), Seattle, USA, Sep 2, 2011. Berlin, Heidelberg: Springer-Verlag, 2011: 65-80.
  • 4Liroz-Gistau M, Akbarinia R, Pacitti E, et al. Dynamic workload-based partitioning for large-scale databases[C]//LNCS 7447: Proceedings of the 23rd International Conference on Database and Expert Systems Applications (DEXA ' 12), Vienna, Austria, Sep 3-6, 2012. Berlin, Heidelberg: SpringerVerlag, 2012: 183-190.
  • 5Wang Xiaoyan, Chen Jinchuan, Du Xiaoyong. ASAWA: an automatic partition key selection strategy[C]//LNCS 7808: Proceedings of the 15th Asia-Pacific Web Conference (APWeb 2013), Sydney, Australia, Apr 4-6,2013. Berlin, Heidelberg: Springer-Verlag, 2013: 609-620.
  • 6Coomans D, Massart D. Alternative k-nearest neighbor rules in supervised pattern recognition: Part 1 k-nearest neighbor classification by using alternative voting rules[J]. Analytica ChimicaActa, 1982, 136: 15-27.
  • 7McCormick W, Schweitzer P, White T. Problem decomposition and data reorganization by a clustering technique[J]. Operations Research, 1972,20(5): 993-1009.
  • 8TPC Benchmark'" C[EB/OL]. (2012)[2013-03-23]. http:// www.tpc.org/tpcc/.
  • 9Buyya R, Yeo C, Venugopal S, et al. Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility[J]. Future Generation Computer Systems, 2009, 25(6): 599-616.
  • 10Ceri S, Negri M, Pelagatti G. Horizontal data partitioning in database design[C]//Proceedings of the 1982 ACM SIGMOD International Conference on Management of Data (SIGMOD '82). New York, NY, USA: ACM,1982: 128-136.

同被引文献113

  • 1James K G, Evelson B, Karel R. In-database analytics , The heart of the predictive enterprise. Forrester Whitepaper, USA: Forrester Research, 2009.
  • 2Brewer E. Towards robust distributed systems/ /Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing. Portland, USA, 2004: 7.
  • 3Brewer E. CAP twelve years later: How the "rules" have changed. Computer, 2012, 45(2): 23-29.
  • 4DeanJ, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 510): 107-113.
  • 5White T. Hadoop , The Definitive Guide. USA: Yahool Press, 2010.
  • 6Isard M, Budiu M, Yu y, et al. Dryad: Distributed data?parallel programs from sequential building blocks/ /Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems. Lisbon, Portugal, 2007: 59-72.
  • 7Olston B, Reed U, Srivastava R, et al. Pig latin: A not-so?foreign language for data processing/ /Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. Vancouver, Canada, 2008: 1099-1110.
  • 8Capriolo E, Wampler D, RutherglenJ. Programming Hive. USA: O'Reilly Media, 2012.
  • 9Yu v , Isard M, Fetterly D, et al. DryadUNQ: A system for general-purpose distributed data-parallel computing using a high-level languagel /Proceedings of the 8th USE NIX Conference on Operating Systems Design and Implementation. Berkeley, USA, 2008: 1-14.
  • 10DeCandia G, Hastorun D, Iarnpani M, et al. Dynamo: Amazon's highly available key-value store/ /Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles. Stevenson, USA, 2007: 205-220.

引证文献4

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部