期刊文献+

支持Unikernel的流式计算引擎:Hummer 被引量:4

Hummer:A Stream Computing Engine with Unikernel Support
下载PDF
导出
摘要 社会计算中,社会公共安全、企业商务智能和舆情计算等众多领域均对实时计算的性能提出了越来越高的要求.流式计算引擎作为大数据计算研究领域的研究热点之一,致力于提供高吞吐量和低延迟的实时计算能力.流式处理任务对处理延迟非常敏感,数据价值随着处理时长的增长而快速递减.传统流式计算引擎设计中,操作系统、JVM等占用大量计算资源,如何提升计算资源利用率成为目前亟待解决的问题.为此,本文提出了一种基于C++语言实现的支持Unikernel的高性能实时数据分析计算引擎Hummer.首先,通过引入Unikernel机制,Hummer可绕过传统操作系统,直接运行于裸机或虚拟化层,减少传统操作系统无关组件带来的性能开销,支持分布式环境下的快速部署与启动,为高性能大数据计算引擎设计提出新的思路.其次,通过使用Unikernel对计算引擎进行封装,解决了C++应用需本地化编译、难以在集群中部署的问题.最后,系统使用灵活的网络通信方案,支持异构网络部署及网络资源隔离.实验表明,Hummer端到端处理延迟低于30ms,较Flink系统低2倍,较Spark Streaming低15.8倍,且吞吐量达到Flink的2倍.使用Unikernel封装的Hummer系统镜像仅为100MB,启动时间约为2s. In social computing,it is well-known that the real-time computing plays an important role in social public security,business intelligence and public opinion monitoring.Therefore,in order to provide high throughput and low latency capabilities,the stream computing engine has sprung up recently as a research hotspot in big data computing area.Generally,most stream processing tasks is very sensitive to latency,and the data value decreases rapidly as the processing time increases.In the traditional streaming computing engine design,the operating system,JVM,etc.occupy a large amount of computing resources and suffer from JVM overheads such as pointer chasing and transparent memory management.Lacking their inability to exploit modern CPUs efficiently and not being able to utilize the entire network bandwidth of modern high-speed networks.How to improve the utilization of computing resources has become an urgent problem to be solved.Therefore,we propose a high-performance real-time stream computing engine,referred to as Hummer,by utilizing C++ programing language and Unikernel.It is known that the traditional operating systems,like CentOS,are designed as a general-purpose system and contain a large number of services to support various applications and hardware configurations.However,many of them are useless,such as sound card or printer driver,and generally results in a huge system size and unnecessary computation overhead.Besides,the hypervisor has to simulate clock interrupts so that traditional operating systems can work properly,which causes that most computing resources are consumed by the operating system when there is no workload.To reduce the unnecessary computing overhead caused by useless services,we consider utilizing the Unikernel to make Hummer bypasses the operating system and run directly on hypervisor or bare-metal environment.Particularly,Hummer also supports quick deployment and startup in a cluster.To the author’s best knowledge,we are the first to apply Unikernel to the design of big data stream computing engines.Secondly,since localized compilation and third-party library dependencies make C++applications difficult to deployed in a cluster,we can utilize Unikernel to solve these problems by packaging the application as an image and eliminating the divergence of machine using hypervisor.Thirdly,we designed a flexible task-oriented network communication solution to decouple the network communication component from TaskManager as normal task.This brings many benefits,such as heterogeneous network support and network source isolation.In most situations,batch processing pays more attention to the throughput,and it almost occupies all the bandwidth of network IO,thus significantly affects the latency sensitive stream processing.Most existing solutions are not optimized for this situation.Nevertheless,we can solve this problem by isolating batch and stream networks using our flexible task-oriented network configuration.Our experiments show that the end-to-end record processing latencies of Hummer is less than 30 ms,which is also 1.7x and 15.8x lower than that of Flink and Spark Streaming,respectively.Moreover,the achievable throughput of Hummer is around 2 xfaster than that of Flink.The Hummer image using Unikernel is only around 100 MB,and the boot time is about 2 s.
作者 李冰 张志斌 钟巧灵 程学旗 LI Bing;ZHANG Zhi-Bin;ZHONGQiao-Ling;CHENG Xue-Qi(CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology ,Chinese Academy of Sciences , Beijing 100190;School of Computer and Control Engineering , University of Chinese Academy of Sciences , Beijing 100049)
出处 《计算机学报》 EI CSCD 北大核心 2019年第8期1755-1766,共12页 Chinese Journal of Computers
基金 中国科学院战略先导科技专项(A类)(XDA19020400)资助
关键词 大数据 数据流 分布式计算 流处理系统 微内核操作系统 big data data stream distributed computation stream processing system Unikernel system
  • 相关文献

同被引文献57

引证文献4

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部