摘要
随着实际应用中图像数据规模的增大和分辨率的提高,图像边缘检测算法的性能成为制约图像实时处理的关键。从向量化访存、数据本地化以及条件分支优化3个方面出发,结合算法特性和底层硬件架构特征,研究Canny边缘检测算法在NVIDIA Tegra K1异构计算平台上的GPU性能优化。实验结果表明,与基于Open CV3.0CPU的Canny边缘检测算法相比,优化后的Canny边缘检测算法在不同图像数据规模下可达13.2倍~17.8倍的性能加速比,具有较好的检测性能。
With the increase of the size of the image data and the improvement of the image resolution,the performance of the image edge detection algorithm becomes the key to the real-time processing of the image. Based on the three aspects of quantitative acess memory, data localization and conditional branch optimization, this paper studies the GPU performance optimization of Canny edge detection algorithm on NVIDIA Tegra K1 heterogeneous computing platform combined with algorithm characteristics and underlying hardware architecture characteristics. The experimental results show that c^mpared ,uith the Canny edge detection algorithm based on OpenCV3.0 CPU, the optimized Canny edge detection algorithm achieves 13.2 times to 17.8 times performance acceleration ratio with different graphic data size, and has better detection performance.
出处
《计算机工程》
CAS
CSCD
北大核心
2017年第5期240-247,共8页
Computer Engineering
基金
国家自然科学基金(NSFC61271370)
北京市教育委员会科技计划面上项目(SQKM201411417010
KM201311417001)
关键词
图像边缘检测
异构计算平台
向量化访存
数据本地化
条件分支优化
image edge detection
heterogeneous computing platform
quantitative acess memory
data localization
conditional branch optimization