摘要
为实现国产飞腾DSP平台对底层图像库的支持,针对原始Canny边缘检测算法计算时间过长的问题,设计一种面向FT-M7002平台的Canny梯度计算并行算法。基于FT-M7002高性能处理架构,采用单指令流多数据流向量化方式增强DSP内核指令的并行处理能力,根据FT-M7002平台向量存储器的层次结构特征,分析Canny梯度计算并行算法的访存模式,通过首地址偏移取址解决不连续访存问题,并结合双缓冲方式完成数据传输与数据计算。实验结果表明,在与原始Canny算法具有相同检测精度的情况下,该算法在卷积核大小为3×3、5×5、7×7时整体运行速度提升了1.490~2.112倍,缩小了与主流加速器件在数字图像处理领域的性能差距。
In order to support the underlying image library on the FT DSP platform,and reduce the time consumed by the calculation in the Canny edge detection algorithm,an algorithm for parallel Canny gradient computing based on FT-M7002 is proposed.On the basis of FT-M7002 high-performance processing architecture,Single Instruction Multiple Data(SIMD)is vectorized to enhance the parallel processing of the instructions of DSP cores.According to the hierarchical structure features of the vector memory of FT-M7002,the mode of data memory access of the Canny parallel gradient computing algorithm is analyzed.The first address offset is used to deal with discontinuous data memory access,and data transmission and data calculation is completed by means of double buffering mode.Experimental results show that when reaching the same detection accuracy as the original Canny algorithm,the proposed algorithm improves the overall running speed by 1.490~2.112 times when the size of convolution core is 3×3,5×5,and 7×7,bridging the performance gap with the mainstream accelerators in digital image processing.
作者
郭恒亮
柴晓楠
韩林
赫晓慧
商建东
GUO Hengliang;CHAI Xiaonan;HAN Lin;HE Xiaohui;SHANG Jiandong(Henan Province Supercomputing Center,Zhengzhou University,Zhengzhou 450000,China;School of Information Engineering,Zhengzhou University,Zhengzhou 450000,China;School of Earth Science and Technology,Zhengzhou University,Zhengzhou 450000,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2021年第7期37-43,共7页
Computer Engineering
基金
国家重点研发计划(2018YFB0505000)。