摘要
流连接广泛应用于提取多源流数据之间的关键信息,是大数据处理的重要支撑技术.但连接两条大数据流时大规模的连接谓词计算,使其易成为性能瓶颈.为提高处理性能,流连接系统常采用并行和分布式两种方式扩展.然而,采用多核并行的流连接系统的扩展性受到CPU核数限制,无法应对大规模数据流.采用分布式扩展的流连接系统由于引入分布式框架运行的开销,导致硬件处理效率严重下降.为实现高效大规模扩展,本文提出一种利用FPGA加速器外设向上扩展的流连接系统FJoin.加速器可进行高并行的流动连接,载入多个流元组后,连接窗口中的数据流经一次即可完成所有连接计算.对于逻辑容易在FPGA实现的连接谓词,通过大量基本连接单元串联构成深度连接流水线,实现大规模并行.通过主机CPU和FPGA设备协同进行连接控制,将连续的流连接计算划分为独立的小批量任务,高效地保证并行化流连接的完整性.在装备FPGA加速卡的平台实现了FJoin,基于大规模真实数据集的测试结果表明,对比部署在40个节点集群上的目前最好的分布式流连接系统,本文提出的流连接加速器FJoin可在单一FPGA加速卡上将连接计算速度提升16倍,达到5倍的系统吞吐,且时延满足实时流处理要求.
Stream join is widely used to extract key information between multi-source stream data and is an important supporting technology for big data processing.Join is easy to become a performance bottleneck because of the large-scale join predicate calculation when joining two big data streams.To improve performance,stream join systems often adopt parallel or distributed expansion methods.However,the multi-core parallel stream join system cannot cope with large-scale data streams because scalability is limited by the number of CPU cores.And the distributed extended stream join system introduces the overhead of distributed framework,resulting in a serious drop in hardware processing efficiency.To achieve efficient and large-scale expansion,this paper proposes a stream join system FJoin that uses the FPGA accelerator to scale up.FJoin can do High-Parallel Flow Join,in which data of the join window can flow through once to complete all join calculations after loading multiple stream tuples.For join predicates whose logic is easy to implement in FPGA,a large number of basic join units are connected in series to form a deep join pipeline to achieve large-scale parallelism.The host CPU and FPGA device coordinate control,divide the continuous stream join calculation into independent small-batch tasks and efficiently ensure completeness of parallel stream join.FJoin is implemented on a platform equipped with an FPGA accelerator card.The test results based on large-scale real data sets show that FJoin can increase the join calculation speed by 16 times using a single FPGA accelerator card and reach 5 times system throughput compared with the current best stream join system deployed on a 40-node cluster,and latency meets the real-time stream processing requirements.
作者
林力韬
陈汉华
金海
Litao LIN;Hanhua CHEN;Hai JIN(National Engineering Research Center for Big Data Technology and System,Services Computing Technology and System Lab,Cluster and Grid Computing Lab,School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China)
出处
《中国科学:信息科学》
CSCD
北大核心
2022年第2期314-333,共20页
Scientia Sinica(Informationis)
基金
国家重点研发计划(批准号:2016QY02D0302)
国家自然科学基金(批准号:61972446)资助项目。
关键词
流连接
FPGA
流处理
硬件加速
并行计算
stream join
FPGA
stream process
hardware accelerate
parallel computing