期刊文献+

面向FT-M7002的Sobel边缘检测算法优化实现 被引量:5

Optimized Realization of Sobel Edge Detection Algorithm for FT-M7002
下载PDF
导出
摘要 边缘检测是图像处理与计算机视觉领域中一种重要的图像分析方法,Sobel算子常用于粗精度的边缘提取,在图像边缘检测中被广泛应用。随着国产飞腾(FT)系列高性能数字信号处理器的发展,图像处理领域对FT平台的需求日益提高,同时急需实现面向FT平台的高性能图像处理算法。针对上述问题,在FT-M7002平台上对Sobel边缘检测算法进行向量并行优化,使用FT-M7002处理器内嵌SIMD指令,挖掘Sobel边缘检测算法中的数据级并行性,同时设计并实现一种字符型与整型数据间的并行化转换接口,使用循环展开优化方法提升指令节拍数,通过DMA矩阵转置解决数据访存不连续的问题。采用双缓冲技术实现数据传输与内核计算的并行,从而隐藏数据传输与计算之间的时间间隙。对比分析多种卷积核大小及图片规模下原Sobel算法与优化算法的性能,结果表明,与原始算法相比,该优化算法能取得1.66~3.14倍的加速比,此外,相较TMS320C6678处理器上的运行结果,在FT-M7002平台上优化算法可达到1.87~2.08倍的加速效果。 Edge detection is a robust image analysis method used in image processing and computer vision.The Sobel operator is widely used in edge detection and image processing.With the development of domestic FT series high-performance Digital Signal Processors(DSP),the demand for FT platforms in image processing is increasing.Moreover,it is urgent to implement high-performance image-processing algorithms for FT platforms.The vector parallel optimization of the Sobel edge detection algorithm was performed on the FT-M7002 platform to solve the above problem.Single Instruction Multiple Data(SIMD)instructions embedded in the FT-M7002 processor were used to mine the data-level parallelism in the Sobel edge detection algorithm.In addition,a parallel conversion interface between the character and integer data was designed and implemented.The loop unrolling optimization method was used to improve the number of instruction beats,and the problem of discontinuous data access and memory was solved using Direct Memory Access(DMA)matrix transposition.Double buffer technology was used to achieve parallel data transmission and kernel computing to eliminate the time gap between data transmission and computing.The performance of the original Sobel algorithm and the optimization algorithm under various convolution kernel sizes and picture sizes were compared and analyzed.The results showed that compared with the original algorithm,the optimization algorithm could achieve an acceleration ratio of 1.66~3.14 times.Compared with the operation results obtained using the TMS320C6678processor,the optimization algorithm could achieve an acceleration effect of 1.87~2.08 times on the FT-M7002 platform.
作者 范明亮 郭子涵 柴晓楠 商建东 FAN Mingliang;GUO Zihan;CHAI Xiaonan;SHANG Jiandong(School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China;National Supercomputing Center in Zhengzhou,Zhengzhou 450001,China)
出处 《计算机工程》 CAS CSCD 北大核心 2022年第6期193-199,共7页 Computer Engineering
基金 国家重点研发计划子课题“全球对地观测成果管理及共享服务系统关键技术研究”(2018YFB0505000)。
关键词 边缘检测 SOBEL算子 高性能数字信号处理器 向量并行 循环展开 edge detection Sobel operator high-performance Digital Signal Processor(DSP) vector parallel loop unrolling
  • 相关文献

参考文献10

二级参考文献46

  • 1谭立勋,刘缠牢,李春燕.实时图像处理中Sobel算子的改进[J].弹箭与制导学报,2006,26(S1):291-293. 被引量:9
  • 2李鑫.浅谈数字信号处理器DSP的发展和应用[J].硅谷,2008,1(14):28-28. 被引量:6
  • 3邵淑华,张晓红,李国彬.浅析数字信号处理器发展与应用[J].办公自动化,2007,16(18):40-41. 被引量:6
  • 4Nvidia. NVIDIA CUD A Programming Guide version 1.1 [EB/OL]. http: //www.nvidia.com/object/cuda_home.html, 2007-11.
  • 5Takahiro Harada. Real-Time Rigid Body Simulation on GPUs [G]//NVIDIA.GPU GEMS3. Addison Wesley Professional, 2007: 611-632.
  • 6Lars Nyland, Mark Harris, Jan Prins. Fast N-Body Simulation with CUDA [G]//NVID1A.GPU GEMS3. Addison Wesley Professional, 2007: 677-696.
  • 7Victor Podlozhnyuk, Mark Harris. Monte Carlo Option Pricing [EB/OL]. http: //www.nvidia.com/object/cuda_home.html, 2007-11-21.
  • 8Victor Podlozhnyuk, Black-Scholes option pricing [EB/OL]. http: //www.nvidia.com/object/cuda home.html, 2007-04-06
  • 9Bernard Descbizeaux, Jean-Yves Blanc. Imaging Earth's Subsurface Using CUDA [G]/! NVID1A.GPU GEMS3. Addison Wesley Professional, 2007: 831-850.
  • 10Davis L S. A Survey of Edge Detection Techniques [J]. CGIP, 1975(4): 248-270.

共引文献63

同被引文献48

引证文献5

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部