期刊文献+

基于Soft-NMS的候选框去冗余加速器设计 被引量:8

A redundacy-reduced candidate box acceleratorbased on soft-non-maximum suppression
下载PDF
导出
摘要 目标检测任务通常使用非极大值抑制算法(NMS)删除卷积神经网络输出的冗余候选框。Soft-NMS使用逐步衰减候选框得分值的方法代替Hard-NMS中直接删除大于预定义阈值候选框的方法,可以避免误删图像中重叠的目标候选框,提高目标检测任务的准确率。但是,频繁地改变候选框得分值使得Soft-NMS较Hard-NMS更为复杂,为了实现高准确率、低延时、低功耗的候选框去冗余效果,提出一种基于Soft-NMS的体系结构,利用对数函数优化复杂的浮点计算,细粒度流水和粗粒度并行组成2级优化结构进一步提升算法的吞吐率。在XILINX KU-115 FPGA开发板上对该体系结构进行了评估,评估结果表明,该体系结构的功耗为6.107 W,处理992个候选框的延时为168.95μs,与CPU实现的Soft-NMS相比,该体系结构实现了36倍的性能提升,性能功耗比为CPU实现的264倍。 Object detection tasks usually use the non-maximum suppression algorithm(NMS)to remove redundant candidate boxes of convolutional neural network's outputs.Soft-NMS uses the method of gradually attenuating the score of candidate box to replace the method of directly deleting the candidate box larger than the predefined threshold in Hard-NMS,which can avoid deleting the overlapping object in the picture by mistake and improve the accuracy of the object detection task.However,the frequent change of candidate box score makes Soft-NMS more complex than Hard-NMS.In order to achieve high accurate,low-delay and low-power candidate box redundancy removals,this paper proposes a Soft-NMS based architecture,which uses logarithmic functions to optimize complex floating-point calculations and a two-level optimization structure with fine-grained flow and coarse-grained parallelism to improve the throughput of the algorithm.Experiments on Xilinx KU-115 FPGA show that our power consumption is 6.107 W,and the delay of processing 1000 boxes is 168.95μs.Compared with the Soft-NMS implemented by the CPU,the architecture achieves 36 times performance improvement and the performance power consumption ratio is 264 times that of CPU implementation.
作者 李景琳 姜晶菲 窦勇 许金伟 温冬 LI Jing-lin;JIANG Jing-fei;DOU Yong;XU Jin-wei;WEN Dong(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
出处 《计算机工程与科学》 CSCD 北大核心 2021年第4期586-593,共8页 Computer Engineering & Science
基金 国家核高基重大专项(2018ZX01028101)。
关键词 可重构计算 目标检测 非极大值抑制 reconfigurable computing object detection non-maximum suppression
  • 相关文献

同被引文献76

引证文献8

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部