FJoin:一种基于FPGA的流连接并行加速器

FJoin:an FPGA-based parallel accelerator for stream join

导出

摘要流连接广泛应用于提取多源流数据之间的关键信息,是大数据处理的重要支撑技术.但连接两条大数据流时大规模的连接谓词计算,使其易成为性能瓶颈.为提高处理性能,流连接系统常采用并行和分布式两种方式扩展.然而,采用多核并行的流连接系统的扩展性受到CPU核数限制,无法应对大规模数据流.采用分布式扩展的流连接系统由于引入分布式框架运行的开销,导致硬件处理效率严重下降.为实现高效大规模扩展,本文提出一种利用FPGA加速器外设向上扩展的流连接系统FJoin.加速器可进行高并行的流动连接,载入多个流元组后,连接窗口中的数据流经一次即可完成所有连接计算.对于逻辑容易在FPGA实现的连接谓词,通过大量基本连接单元串联构成深度连接流水线,实现大规模并行.通过主机CPU和FPGA设备协同进行连接控制,将连续的流连接计算划分为独立的小批量任务,高效地保证并行化流连接的完整性.在装备FPGA加速卡的平台实现了FJoin,基于大规模真实数据集的测试结果表明,对比部署在40个节点集群上的目前最好的分布式流连接系统,本文提出的流连接加速器FJoin可在单一FPGA加速卡上将连接计算速度提升16倍,达到5倍的系统吞吐,且时延满足实时流处理要求. Stream join is widely used to extract key information between multi-source stream data and is an important supporting technology for big data processing.Join is easy to become a performance bottleneck because of the large-scale join predicate calculation when joining two big data streams.To improve performance,stream join systems often adopt parallel or distributed expansion methods.However,the multi-core parallel stream join system cannot cope with large-scale data streams because scalability is limited by the number of CPU cores.And the distributed extended stream join system introduces the overhead of distributed framework,resulting in a serious drop in hardware processing efficiency.To achieve efficient and large-scale expansion,this paper proposes a stream join system FJoin that uses the FPGA accelerator to scale up.FJoin can do High-Parallel Flow Join,in which data of the join window can flow through once to complete all join calculations after loading multiple stream tuples.For join predicates whose logic is easy to implement in FPGA,a large number of basic join units are connected in series to form a deep join pipeline to achieve large-scale parallelism.The host CPU and FPGA device coordinate control,divide the continuous stream join calculation into independent small-batch tasks and efficiently ensure completeness of parallel stream join.FJoin is implemented on a platform equipped with an FPGA accelerator card.The test results based on large-scale real data sets show that FJoin can increase the join calculation speed by 16 times using a single FPGA accelerator card and reach 5 times system throughput compared with the current best stream join system deployed on a 40-node cluster,and latency meets the real-time stream processing requirements.

作者林力韬陈汉华金海 Litao LIN;Hanhua CHEN;Hai JIN(National Engineering Research Center for Big Data Technology and System,Services Computing Technology and System Lab,Cluster and Grid Computing Lab,School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China)

机构地区华中科技大学计算机科学与技术学院

出处《中国科学：信息科学》 CSCD 北大核心 2022年第2期314-333,共20页 Scientia Sinica(Informationis)

基金国家重点研发计划(批准号:2016QY02D0302) 国家自然科学基金(批准号:61972446)资助项目。

关键词流连接 FPGA 流处理硬件加速并行计算 stream join FPGA stream process hardware accelerate parallel computing

分类号 TN791 [电子电信—电路与系统] TP332 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献1

1李成龙,李韬,韩玉浩,冯振乾,王宝生.DNS权威服务器FPGA加速技术研究[J].中国科学：信息科学,2020,50(4):576-587. 被引量：2

共引文献1

1李雪彤,陈晓,宋磊.基于FPGA的ICN名字解析缓存加速系统[J].现代电子技术,2024,47(7):109-114.

1胡琼.基于深度学习的智慧城市关键目标识别研究[J].贵阳学院学报（自然科学版）,2021,16(2):30-34. 被引量：1
2狄新凯,杨海钢.基于FPGA的稀疏化卷积神经网络加速器[J].计算机工程,2021,47(7):189-195. 被引量：4
3王婷,陈斌岳,张福海.基于FPGA的卷积神经网络并行加速器设计[J].电子技术应用,2021,47(2):81-84. 被引量：4
4巩杰,赵烁,何虎,邓宁.基于FPGA的量化CNN加速系统设计[J].计算机工程,2022,48(3):170-174. 被引量：2
5申国江.基于5G时代融媒体发展趋势的探讨[J].传媒论坛,2021,4(23):29-31. 被引量：6
6王育红,刘康晨.面向新型测绘的时空数据属性连接及扩展方法研究[J].测绘通报,2021(5):5-9. 被引量：3
7刘怡然,谢鹏飞,邓雪峰.一种基于有限状态机的分布式框架配置属性校验方法[J].江苏大学学报（自然科学版）,2022,43(1):83-87. 被引量：1
8于凌云,黄渺萍,魏秋芳.基于多源流理论视角的职工医保个人账户的政策变迁分析[J].中国卫生经济,2021,40(12):17-19. 被引量：6
9张海水.多源流理论视角下中华优秀传统文化进中小学课程教材政策分析[J].教育评论,2021(12):144-150. 被引量：2
10郁晓冬,戴毅,王俊杰,张峰.基于多核ARM的辅助驾驶系统设计与优化[J].电气自动化,2022,44(1):102-104. 被引量：2

中国科学：信息科学

2022年第2期

浏览历史

内容加载中请稍等...

FJoin:一种基于FPGA的流连接并行加速器

参考文献1

共引文献1

相关作者

相关机构

相关主题

浏览历史