摘要
为有效提高场景藏文文本检测性能,提出一种基于双注意力YOLOv5的场景藏文检测框架,简称为YOLOv5 Dual-attention。通过在YOLOv5模型上下采样层之间采用背景抑制模块,聚合多尺度的初始特征,抑制卷积特征中的背景干扰;在YOLOv5的颈部和检测头中间嵌入卷积注意力提高卷积提取特征的能力,使网络具有精确推断文本的能力。实验结果表明,在二分类MSTD500测试集上,改进后的模型YOLOv5x Dual-attention+α-IoU对单类藏文场景文本检测的F1达到了84.65%,比目前最好的同类检测结果高出12.65个百分点,有效降低了文本目标漏检和误检可能性。
To effectively improve the performance of scene Tibetan text detection,a scene Tibetan detection framework based on dual attention YOLOv5 was proposed,referred to as YOLOv5 Dual-attention.By adopting the background suppression module between the upper and lower sampling layers of the YOLOv5 model,the initial features of multiple scales were effectively aggregated,and the background interference in the convolutional features was effectively suppressed.Convolutional attention was embedded between the neck and detection head of YOLOv5 to improve the ability of convolution feature extraction,so that the network accurately inferred text.Experimental results show that on the dichotomous MSTD500 test set,the improved model YOLOv5x Dual-attention+α-IoU achieves an F1 of 84.65%for text detection in a single Tibetan scene,which is 12.65 percentage points higher than that of the current best similar detection results.The possibility of missing and false detection of text targets is effectively reduced.
作者
才让当知
黄鹤鸣
范玉涛
樊永红
CaiRangDangZhi;HUANG He-ming;FAN Yu-tao;FAN Yong-hong(School of Computer Science and Technology,Qinghai Normal University,Xining 810008,China;State Key Laboratory of Tibetan Intelligent Information Processing and Application,Qinghai Normal University,Xining 810008,China;Key Laboratory of Tibetan Information Processing of Ministry of Education,Qinghai Normal University,Xining 810008,China)
出处
《计算机工程与设计》
北大核心
2023年第11期3411-3419,共9页
Computer Engineering and Design
基金
国家自然科学基金项目(62066039、62166034)
青海省自然科学基金项目(2022-ZJ-925)。
关键词
藏文检测
场景文本检测
通道注意力
空间注意力
双注意力
损失函数
小目标文本检测
Tibetan text detection
scene text detection
channel attention
spatial attention
dual-attention
loss function
small target text detection