期刊文献+

面向OpenVX核心图像处理函数的并行架构设计 被引量:1

Parallel Architecture Design for OpenVX Kernel Image Processing Functions
下载PDF
导出
摘要 传统的可编程处理器虽然高度灵活,但其处理速度及性能不及专用集成电路(ASIC),而图像处理往往是多样、密集且重复的操作,因此处理器要兼顾速度、性能及灵活性。OpenVX是图像图形处理、图计算和深度学习等应用的预处理或者辅助处理开源标准,基于最新的OpenVX 1.3标准中的核心图像处理函数库,设计并实现了一种可编程、可扩展的专用指令集处理器(ASIP)——OpenVX并行处理器。首先分析对比了各种互联网络的拓扑特性,选择了性能比较突出的层次交叉互联网络(HCCM+)作为系统主干,在网络节点处设置处理单元(PE)构成支持动态配置的4×4 PE阵列,结合高效的路由通信方式设计了并行处理器,实现可编程的图像处理。其次所提出的架构适合数据并行计算和新兴的图计算,两种计算模式可单独或混合配置使用,分别将核心视觉函数及图计算模型映射到并行处理器上对两种模式进行验证,对比PE数目不同的情况下图像处理的速度。实验结果表明,并行处理器能够完成对基本核心函数和高复杂度的图计算模型的映射,在数据并行计算和流水线处理两种模式下,可以对图像处理线性加速,调用16个PE对各类函数的平均加速比可达15.0375。验证环境采用20 nmXCVU440平台芯片,综合实现后频率为125 MHz。 Although the traditional programmable processors are highly flexible,their processing speed and perfor mance are inferior to the application specific integrated circuit(ASIC).Image processing is often a diverse,intensive and repetitive operation,so the processor must balance speed,performance and flexibility.OpenVX is an open source standard for preprocessing or auxiliary processing of image processing,graph computing and deep learning applications.Aiming at the kernel visual function library of OpenVX 1.3 standard,this paper designs and implements a programmable and extensible OpenVX parallel processor.The architecture adopts an application specific instruction processor(ASIP).After analyzing and comparing the topological characteristics of various interconnection networks,the backbone of the ASIP chooses the hierarchically cross-connected Mesh+(HCCM+)with outstanding performance,and processing element(PE)is set at network nodes.PE array is constructed to support dynamic configuration,and a parallel processor is designed to realize programmable image processing based on efficient routing and com munication.The proposed architecture is suitable for data parallel computing and emerging graph computing.The two computing modes can be configured separately or mixed.The kernel visual function and graph computing model are mapped to the parallel processor respectively to verify the two modes and compare the image processing speed under different PE numbers.The results show that OpenVX parallel processor can complete the mapping and linear speedup of kernel functions and high complexity graph calculation model.The average speedup of scheduling 16 PEs to various functions is approximately 15.0375.When implemented on an FPGA board with a 20 nm XCVU440 device,the prototype can run at a frequency of 125 MHz.
作者 潘风蕊 李涛 邢立冬 张好聪 吴冠中 PAN Fengrui;LI Tao;XING Lidong;ZHANG Haocong;WU Guanzhong(School of Electronic Engineering,Xi’an University of Posts&Telecommunications,Xi􀆳an 710121,China;School of Computer Science&Technology,Xi’an University of Posts&Telecommunications,Xi􀆳an 710121,China)
出处 《计算机科学与探索》 CSCD 北大核心 2022年第7期1570-1582,共13页 Journal of Frontiers of Computer Science and Technology
基金 陕西省科技统筹项目(2015KTCQ013) 陕西省教育厅协同创新中心项目(17JF032) 陕西省教育厅科研计划项目(20JY058)。
关键词 OpenVX核心图像处理函数 专用指令集处理器(ASIP) 并行处理器 层次交叉互联网络(HCCM+) 图计算模型 OpenVX kernel image processing functions application specific instruction processor(ASIP) parallel processor hierarchically cross-connected mesh+(HCCM+) graph calculation model
  • 相关文献

参考文献9

二级参考文献64

共引文献64

同被引文献21

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部