摘要
作为一个多任务的学习过程,目标检测相较于分类网络需要更好的特征。基于多尺度特征对不同尺度的目标进行预测的检测器性能已经大大超过了基于单一尺度特征的检测器。同时,特征金字塔结构被用于构建所有尺度的高级语义特征图,从而进一步提高了检测器的性能。但是,这样的特征图没有充分考虑到上下文信息对语义的补充作用。在SSD基准网络的基础上,采用残差注意力的特征融合方法充分利用上下文信息,提高特征图的表征能力,然后利用残差注意力机制强化关键特征。在基准数据集PASCAL VOC上的实验表明,所提方法在输入图像尺寸为300×300和512×512情况下的mAP分别为78.8%和80.7%。
As a multi-task learning process,object detection requires better features than classification task.Detectors that predict different scale objects based on multi-scale features have greatly surpassed detectors based on single-scale features.In addition,the feature pyramid structure is used to build advanced semantic feature maps of all scales,thereby further improving the performance of the detector.However,such feature maps do not fully consider the complementary role of contextual information to semantics.Based on the SSD baseline network,a feature fusion method based on residual attention mechanism is used to make full use of the context information.Not only can the high-resolution feature representation capabilities be enhanced by feature fusion,which is more helpful for detecting small-scale objects,but also the attention mechanism is used to strengthen the key features required for prediction.The performance of the model is evaluated on benchmark data set PASCAL VOC,the map of the model with input image sizes of 300×300 and 512×512 is 78.8%and 80.7%.
作者
李本高
吴从中
许良凤
詹曙
LI Ben-gao;WU Cong-zhong;XU Liang-feng;ZHAN Shu(School of Computer and Information,Hefei University of Technology,Hefei 231009,China)
出处
《计算机工程与科学》
CSCD
北大核心
2021年第2期347-353,共7页
Computer Engineering & Science
基金
国家自然科学基金(61371156)。
关键词
目标检测
特征融合
注意力机制
多尺度特征
上下文信息
object detection
feature fusion
attention mechanism
multi-scale feature
contextual information