摘要
伴随着互联网和移动互联网的发展,各种新兴应用层出不穷,对大数据处理的实时性和高并发能力要求也在不断提高。Apache Kafka,作为一种分布式的消息系统,具有可水平扩展和高吞吐率而被广泛的使用。对于数据业务的基础支撑系统,除了能够满足高并发度和实时性以外,数据的质量即数据可靠性也是关键的一环。但是,由Kafka原生提供的数据消费者不能够保障数据的可靠性。本文首先简单介绍了Kafka的组成、架构特性等技术背景,然后阐述了原生Consumer的原理和缺陷;最后,基于Kafka提出一个可靠的消费者的设计方案。本方案是基于Kafka的low-level的接口集,解决了Kafka原生Consumer由于将用户消费数据的动作与数据消费位置的记录独立而引起的数据质量问题,保障了数据的可靠性。最后,搭建Kafka集群测试环境,验证了方案的可行性和正确性。
With the development of Internet and mobile Internet,a variety of new applications emerging for large real-time data processing and high concurrency requirements are also rising.Apache Kafka,as a distributed messaging system with high throughput can scale horizontally and is widely used.For data services on the basis of support sys-tems,in addition to be able to meet the high degree of concurrency and real-time outside the quality of the data,that data reliability is a key part.However,the native data consumer provided by Kafka cannot guarantee the reliability of the data.This article,at first,introduce the composition,architectural features of Kafka briefly and the principle and defects of native consumers,then,based on Kafka gives a design of reliable consumer.This design is based on the low-level Kafka interface,which solves the data quality problem of native Consumer that caused by making the action of the user's consumption and the offset of the data independent,thus ensuring the reliability of the data.Finally,build the test environment of Kafka cluster,and verify the feasibility and correctness of the design.
出处
《软件》
2016年第1期61-66,共6页
Software