期刊文献+

非阻塞事务型实时数据注入技术研究与实现

Research and implementation of transactional real-time data ingestion technology without blocking
下载PDF
导出
摘要 伴随着大数据时代来临,传统数据库系统已逐渐无法应对海量数据处理带来的挑战,而分布式数据库系统得到了越来越多的部署和应用.分布式数据库系统部署数据于多台机器上,利用大规模并行计算技术实现了对海量数据的存储、管理和分析.但针对金融领域严苛的事务型实时数据注入需求,现有分布式数据库系统对其支持有限,其主要原因在于利用锁和两阶段提交等方式实现分布式事务处理,无法做到非阻塞式数据注入,极大地影响了数据注入的性能.华东师范大学数据科学与工程研究院自主研发的分布式内存数据库系统——CLAIMS,已能提供面向关系型数据集的实时数据分析服务,但尚不能支持实时数据注入.针对上述实时数据注入的问题,本文重点分析了现有数据注入技术和基于分布式事务处理的实现方式,设计了面向元数据的集中式事务处理策略,利用无锁编程技术,实现了支持分布式事务的高性能实时数据注入框架,并通过热备机制实现系统的高可用性.上述框架在CLAIMS系统中的实现,经充分实验表明:该框架能够实现高通量的事务型实时数据注入,同时支持低延时的实时数据查询. With the advent of big data era, traditional database systems are facing difficulties in satisfying the new challenges brought by massive data processing, while distributed database systems have been deployed widely in real applications. Distributed database systems partitioned and the dispatched the data across machines under a designed scheme and analyzed all the massive data in massive parallel manner. In facing of the requirements of the transactional real-time data ingestion from financial field, distributed database systems are ineffective and inefficient due to their implementation of the distributed transaction processing based on the lock and two-phase commit, which lead to the impossibility of non-blocking data ingestion. CLAIMS is a distributed in-memory database system designed and implemented by Institute for Data Science and Engineering of ECNU. It supports real-time data analysis towards relational data set but is incapable of real-time data ingestion. To address these problems, we analyzed data ingestion technology and distributed transaction processing algorithms first, and proposed to mimic the transactional data ingestion in the distributed environment with the centralized transaction processing based on meta data, and eventually achieved the real-time data ingestion with high availability and without blocking. The experiment results with the implementation of the proposed algorithms in CLAIMS proved that the proposed framework could achieve high throughput transactional real-time data ingestion as well as low latency real-time query processing.
出处 《华东师范大学学报(自然科学版)》 CAS CSCD 北大核心 2016年第5期131-143,164,共14页 Journal of East China Normal University(Natural Science)
基金 国家自然科学基金重点项目(61332006) 上海市基金(13ZR1413200)
关键词 分布式数据库 实时数据注入 事务 CLAIMS distributed database system real-time data ingestion transaction processing CLAIMS
  • 相关文献

参考文献26

  • 1DEAN J, GHEMAWAT S. MapReduce: Simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1): 107-113.
  • 2ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark: Cluster computing with working sets [C]//Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. Berkeley: USENIX Association, 2010: 10.
  • 3SHVACHKO K, KUANG H, RADIA S, et al. The hadoop distributed file system [C]//Proceedings of IEEE Conference on MSST. 2010: 1-10.
  • 4胡健,和轶东.sAP内存计算--HANA[M].北京:清华大学出版社,2013.
  • 5F.RBER F, CHA S K, PRIMSCH J, et al. SAP HANA database: Data management for modern business applications [J]. ACM Sigmod Record 2012, 40(4): 45-51.
  • 6GLIGOR G, TEODORU S. Oracle exalytics: Engineered for speed-of-thought analytics [J]. Database Systems Journal, 2011, 2(4): 3-8.
  • 7WANG L, ZHOU M Q, ZHANG Z J, et al. Elastic pipelining in in-memory DataBase cluster [R]. 2016.
  • 8TRAVERSO M. Presto: Interacting with petabytes of data at Facebook [EB/OL].(2013-11-07)[2016-06-10]. http//www.facebk.cm/ntes/facebk-engineering/prest-interacting-withpetabytes-f-data-at-facebk/ 10151786197628920.
  • 9ARMBRUST M, XIN R S, LIAN C, et al. Spark SQL: Relational data processing in spark [C]//Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015: 1383-1394.
  • 10YANG F, TSCHETTER E, LIAUTI X, et al. Druid: A real-time analytical data store [C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部