摘要
边缘检测是图像处理与计算机视觉领域中一种重要的图像分析方法,Sobel算子常用于粗精度的边缘提取,在图像边缘检测中被广泛应用。随着国产飞腾(FT)系列高性能数字信号处理器的发展,图像处理领域对FT平台的需求日益提高,同时急需实现面向FT平台的高性能图像处理算法。针对上述问题,在FT-M7002平台上对Sobel边缘检测算法进行向量并行优化,使用FT-M7002处理器内嵌SIMD指令,挖掘Sobel边缘检测算法中的数据级并行性,同时设计并实现一种字符型与整型数据间的并行化转换接口,使用循环展开优化方法提升指令节拍数,通过DMA矩阵转置解决数据访存不连续的问题。采用双缓冲技术实现数据传输与内核计算的并行,从而隐藏数据传输与计算之间的时间间隙。对比分析多种卷积核大小及图片规模下原Sobel算法与优化算法的性能,结果表明,与原始算法相比,该优化算法能取得1.66~3.14倍的加速比,此外,相较TMS320C6678处理器上的运行结果,在FT-M7002平台上优化算法可达到1.87~2.08倍的加速效果。
Edge detection is a robust image analysis method used in image processing and computer vision.The Sobel operator is widely used in edge detection and image processing.With the development of domestic FT series high-performance Digital Signal Processors(DSP),the demand for FT platforms in image processing is increasing.Moreover,it is urgent to implement high-performance image-processing algorithms for FT platforms.The vector parallel optimization of the Sobel edge detection algorithm was performed on the FT-M7002 platform to solve the above problem.Single Instruction Multiple Data(SIMD)instructions embedded in the FT-M7002 processor were used to mine the data-level parallelism in the Sobel edge detection algorithm.In addition,a parallel conversion interface between the character and integer data was designed and implemented.The loop unrolling optimization method was used to improve the number of instruction beats,and the problem of discontinuous data access and memory was solved using Direct Memory Access(DMA)matrix transposition.Double buffer technology was used to achieve parallel data transmission and kernel computing to eliminate the time gap between data transmission and computing.The performance of the original Sobel algorithm and the optimization algorithm under various convolution kernel sizes and picture sizes were compared and analyzed.The results showed that compared with the original algorithm,the optimization algorithm could achieve an acceleration ratio of 1.66~3.14 times.Compared with the operation results obtained using the TMS320C6678processor,the optimization algorithm could achieve an acceleration effect of 1.87~2.08 times on the FT-M7002 platform.
作者
范明亮
郭子涵
柴晓楠
商建东
FAN Mingliang;GUO Zihan;CHAI Xiaonan;SHANG Jiandong(School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China;National Supercomputing Center in Zhengzhou,Zhengzhou 450001,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2022年第6期193-199,共7页
Computer Engineering
基金
国家重点研发计划子课题“全球对地观测成果管理及共享服务系统关键技术研究”(2018YFB0505000)。
关键词
边缘检测
SOBEL算子
高性能数字信号处理器
向量并行
循环展开
edge detection
Sobel operator
high-performance Digital Signal Processor(DSP)
vector parallel
loop unrolling