期刊文献+

一种基于Kafka的可靠的Consumer的设计方案 被引量:37

A Design of Reliable Consumer Based on Kafka
下载PDF
导出
摘要 伴随着互联网和移动互联网的发展,各种新兴应用层出不穷,对大数据处理的实时性和高并发能力要求也在不断提高。Apache Kafka,作为一种分布式的消息系统,具有可水平扩展和高吞吐率而被广泛的使用。对于数据业务的基础支撑系统,除了能够满足高并发度和实时性以外,数据的质量即数据可靠性也是关键的一环。但是,由Kafka原生提供的数据消费者不能够保障数据的可靠性。本文首先简单介绍了Kafka的组成、架构特性等技术背景,然后阐述了原生Consumer的原理和缺陷;最后,基于Kafka提出一个可靠的消费者的设计方案。本方案是基于Kafka的low-level的接口集,解决了Kafka原生Consumer由于将用户消费数据的动作与数据消费位置的记录独立而引起的数据质量问题,保障了数据的可靠性。最后,搭建Kafka集群测试环境,验证了方案的可行性和正确性。 With the development of Internet and mobile Internet,a variety of new applications emerging for large real-time data processing and high concurrency requirements are also rising.Apache Kafka,as a distributed messaging system with high throughput can scale horizontally and is widely used.For data services on the basis of support sys-tems,in addition to be able to meet the high degree of concurrency and real-time outside the quality of the data,that data reliability is a key part.However,the native data consumer provided by Kafka cannot guarantee the reliability of the data.This article,at first,introduce the composition,architectural features of Kafka briefly and the principle and defects of native consumers,then,based on Kafka gives a design of reliable consumer.This design is based on the low-level Kafka interface,which solves the data quality problem of native Consumer that caused by making the action of the user's consumption and the offset of the data independent,thus ensuring the reliability of the data.Finally,build the test environment of Kafka cluster,and verify the feasibility and correctness of the design.
作者 王岩 王纯
出处 《软件》 2016年第1期61-66,共6页 Software
关键词 Kafka 数据可靠性 zookeeper 实时 Kafka Data reliability Zookeeper Real time
  • 相关文献

参考文献4

二级参考文献34

  • 1莫磊,胥布工.基于分布式估计及任务分配的WSANs协同机制[J].新型工业化,2013,2(12):15-27. 被引量:5
  • 2马建刚,黄涛,汪锦岭,徐罡,叶丹.面向大规模分布式计算发布订阅系统核心技术[J].软件学报,2006,17(1):134-147. 被引量:128
  • 3刘云生,张童,张传富,查亚兵.异构分布式实时仿真系统的容错调度算法[J].软件学报,2006,17(10):2040-2047. 被引量:9
  • 4Jeffrey Dean,Sanjay Ghemawat.MapReduce[J].Communications of the ACM.2008(1)
  • 5Arvind Arasu,Shivnath Babu,Jennifer Widom.The CQL continuous query language: semantic foundations and query execution[J].The VLDB Journal.2006(2)
  • 6Hari Balakrishnan,Magdalena Balazinska,Don Carney,U?ur ?etintemel,Mitch Cherniack,Christian Convey,Eddie Galvez,Jon Salz,Michael Stonebraker,Nesime Tatbul,Richard Tibbetts,Stan Zdonik.Retrospective on Aurora[J].The VLDB Journal.2004(4)
  • 7Daniel J. Abadi,Don Carney,Ugur ?etintemel,Mitch Cherniack,Christian Convey,Sangdon Lee,Michael Stonebraker,Nesime Tatbul,Stan Zdonik.Aurora: a new model and architecture for data stream management[J].The VLDB Journal.2003(2)
  • 8Jim Gray,Goetz Graefe.The five-minute rule ten years later, and other computer storage rules of thumb[J].ACM SIGMOD Record.1997(4)
  • 9Vincenzo Gulisano,Ricardo Jimenez-Peris,Marta Patino-Martinez.StreamCloud: An Elastic and Scalable Data Streaming System[].IEEE Transactions on Parallel and Distributed Systems.2012
  • 10Stoellberger P.S4Latin :Language-based big data streaming [D/OL][].http ://analytical-labscom/downloads/msc _BigDataStreamspdf.2011

共引文献29

同被引文献211

引证文献37

二级引证文献127

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部