摘要
微服务调用链路数据是微服务应用系统日常运行中产生的一类重要数据,它以链路形式记录了微服务应用中一次用户请求对应的一系列服务调用信息。由于系统的分布性,微服务调用链路数据产生在不同的微服务部署节点,当前对这些分布数据的采集一般采用全量采集和采样采集两种方法。全量采集会产生较大数据传输和数据存储等成本,而采样采集则可能会漏掉关键的链路数据。因此,提出一种基于事件驱动和流水线采样的微服务调用链路数据动态采集方法,并基于开源软件Zipkin设计实现了一个微服务调用链路数据动态采集系统。该系统首先对不同节点符合预定义事件特征的链路数据进行流水线采样,即数据采集服务端只在某节点产生事件定义的数据时对所有节点采集同一链路数据;同时,针对不同节点的数据产生速率不一致问题,采用基于时间窗口的多线程流式数据处理和数据同步技术实现不同节点的数据采集和传递;最后,针对各节点链路数据到达服务端先后顺序不一的问题,通过时序对齐方式进行全链路数据的同步和汇总。在公开的微服务调用链路数据集上的实验结果表明,相较于全量采集和采样采集方法,所提方法对于包含异常、慢响应等特定事件的链路数据具有采集准确性高、效率好的效果。
Microservice invocation link data is a type of important data generated in the daily operation of the microservice application system,which records a series of service invocation information corresponding to a user request in the microservice application in the form of link.Microservice invocation link data are generated at different microservice deployment nodes due to the distribution characteristic of the system,and the current collection methods for these distributed data include full collection and sampling collection.Full collection may bring large data transmission and data storage costs,while sampling collection may miss critical invocation data.Therefore,an event⁃driven and pipeline sampling based dynamic collection method for microservice invocation link data was proposed,and a microservice invocation link system that supports dynamic collection of invocation link data was designed and implemented based on the open⁃source software Zipkin.Firstly,the pipeline sampling was performed on the link data of different nodes that met the predefined event features,that is the same link data of all nodes were collected by the data collection server only when the event defined data was generated by a node;meanwhile,to address the problem of inconsistent data generation rates of different nodes,multi⁃threaded streaming data processing technology based on time window and data synchronization technology were used to realize the data collection and transmission of different nodes.Finally,considering the problem that the link data of each node arrives at the server in different sequential order,the synchronization and summary of the full link data were realized through the timing alignment method.Experimental results on the public microservice lrevocation ink dataset prove that compared to the full collection and sampling collection methods,the proposed method has higher accuracy and more efficient collection on link data containing specific events such as anomalies and slow responces.
作者
李鹏
赵卓峰
李寒
LI Peng;ZHAO Zhuofeng;LI Han(School of Information Science and Technology,North China University of Technology,Beijing 100144,China;Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream Data(North China University of Technology),Beijing 100144,China)
出处
《计算机应用》
CSCD
北大核心
2022年第11期3493-3499,共7页
journal of Computer Applications
基金
国家重点研发计划项目(2019YFB1405100)
北京市自然科学基金资助项目(4202021)。
关键词
微服务
调用链路数据
动态采样
事件匹配
缓存机制
服务链路追踪
microservice
invocation link data
dynamic sampling
event matching
caching mechanism
service link tracing