摘要
针对目前深度学习应用缺少有效的开发与部署工具的问题,提出一个面向深度学习应用的组件式开发框架。所提框架根据应用的资源消耗类型进行功能拆分,使用评测引导的资源分配方案进行瓶颈消除,使用分步装箱方案兼顾高CPU利用率和低显存开销的功能放置。基于此框架开发的实时车牌号检测应用,在吞吐优先模式下GPU利用率达到82%,在延迟优先模式下平均应用延迟达到0.73 s,在三种模式下(吞吐优先模式、延迟优先模式以及吞吐/延迟的均衡模式)下,CPU平均利用率达到68.8%。实验结果表明,基于此框架能够进行硬件吞吐与应用延迟的平衡型配置,在吞吐优先模式下高效利用平台的计算资源,在延迟优先模式下满足应用的低延迟需求。相较于MediaPipe,使用本框架能够进行超实时的多人姿态估计应用开发,应用的检测帧率最高提升了1077%。实验结果表明,所提框架能够作为CPU-GPU异构服务器上面向深度学习应用开发部署的有效解决方案。
Concerning the current lack of effective development and deployment tools for deep learning applications,a component-based development framework for deep learning applications was proposed.The framework splits functions according to the type of resource consumption,uses a review-guided resource allocation scheme for bottleneck elimination,and uses a step-by-step boxing scheme for function placement that takes into account high CPU utilization and low memory overhead.The real-time license plate number detection application developed based on this framework achieved 82%GPU utilization in throughput-first mode,0.73 s average application latency in latency-first mode,and 68.8%average CPU utilization in three modes(throughput-first mode,latency-first mode,and balanced throughput/latency mode).The experimental results show that based on this framework,a balanced configuration of hardware throughput and application latency can be performed to efficiently utilize the computing resources of the platform in the throughput-first mode and meet the low latency requirements of the applications in the latency-first mode.Compared with MediaPipe,the use of this framework enabled ultra-real-time multi-person pose estimation application development,and the detection frame rate of the application was improved by up to 1077%.The experimental results show that the framework is an effective solution for deep learning application development and deployment on CPU-GPU heterogeneous servers.
作者
刘祥
华蓓
林飞
魏宏原
LIU Xiang;HUA Bei;LIN Fei;WEI Hongyuan(College of Computer Science and Technology,University of Science and Technology of China,Hefei Anhui 230027,China)
出处
《计算机应用》
CSCD
北大核心
2024年第2期526-535,共10页
journal of Computer Applications
基金
国家重点研发计划项目(2018AAA0101204)。
关键词
深度学习应用
开发框架
基于组件的开发
流水线部署
CPU-GPU异构
deep learning application
development framework
Component-Based Development(CBD)
pipeline deployment
CPU-GPU heterogeneity