期刊文献+

面向小目标和遮挡目标检测的脑启发CIRA-DETR全推理方法 被引量:4

Brain-Inspired CIRA-DETR Full Inference Model for Small and Occluded Object Detection
下载PDF
导出
摘要 Facebook AI研究者2020年提出的Detection Transformer(DETR)目标检测方法采用简单的编码器-解码器结构,利用集合预测来解决物体检测问题,算法简单、通用、避免了很多手工设计和调参问题,吸引了学术界和产业界的广泛关注.然而,DETR方法对于输入特征的分辨率大小有限制,同时在检测推理过程中缺失相对位置信息,从而导致对小目标和被遮挡目标的检测性能较差.为解决这一问题,受脑认知启发,本文提出基于胶囊推理和残差增强的全推理目标检测网络(Capsule-Inferenced and Residual-Augmented Detection Transformers,CIRA_DETR).首先,建立层间残差信息增强模块,利用大小尺度的差异性对小尺度特征图进行信息增强,在小目标的检测效果上提升了1.8%.接着,为了更贴近人脑的思维方式,更好的建模神经网络中内部知识表示的分层关系,在Transformer的结果进行推理的过程中,引入胶囊推理模块挖掘实体信息,并利用双向注意力路由进行前向信息传递和后向信息的反馈,以此预测图像中目标的类别和位置信息,有效降低了遮挡下的目标检测问题的难度.最后,在目标信息的映射处理中,引入非线性超香肠映射函数,实现了灵活的超曲面构建,有效表达特征和目标类别以及位置之间的映射关系.在COCO数据集上的测试结果验证了CIRA_DETR模型的有效性,其在小目标、中目标和大目标的检测上,平均预测准确率分别达到了25.8%、48.7%和62.7%.本文小目标的检测性能可以和Faster-RCNN相媲美,同时可视化的结果以及性能指标也反映了,相比传统的DETR模型,本文CIRA_DETR模型在被遮挡目标检测上的优势. In recent years,the deep learning has been applied in many image processing tasks.Deep learning shows an excellent performance and encourages many researchers to apply it to object recognition,including various popular directions:improve the direction accuracy by updating the network structure,design a simple network model based on the transformer,and obtain better detection results through the characteristic analysis of the gantry.The DETR object detection model proposed by Facebook AI researchers in 2020 utilize simple encoder-decoder structure,and views object detection as a direct set prediction problem.The DETR model is simple,general,and can avoid many manual designs and tuning problems,attracting widespread attention of the academia and industry.However,due to the limitation of DETR model on the size of input feature map,too small size will lead to insufficient object information.Although the performance of the model has been improved to a certain extent,its detection effect on small targets and occluded targets is not ideal.In the detection of small targets and occluded targets,the entity information corresponding to features and the relative position information between entities are very key to target reasoning.However,in DETR model,feedforward neural network FFN only realizes target information reasoning through weighted summation,and does not consider the interactive information between features,which has become the main factor affecting the detection effect.In contrast,humans can easily detect small targets and occluded targets.In order to solve these problems,we inspire by brain cognition,propose a novel full-inference model,called Capsule-inference and residual-augmented DETR(CIRA_DETR)inspired by the brain cognition.Firstly,CIRA_DETR establishes an inter layer residual information enhancement module to enhance the target related information in the small-scale map by calculating the differences between the large and small-scale feature maps,which can improve the convergence speed and detection performance of DETR model without increasing the complexity of the algorithm.Thus,the performance for small object detection improves by 1.8%.Then,in order to more closely match the way of human brain thinks and to better model the hierarchical relationships of intra-knowledge representation in the neural networks,during the inference process,capsule-inference module is constructed to explore the object entity information,and utilize the bi-directional attention routing for forward information delivery and backward information feedback,and then the object class and location information can be inferenced.With the capsule-inference module,the problems for occlude object detection has been greatly alleviated.Finally,neuroscience suggests that anything human sees converges to a continuous attractor in different ways,which may be a curve,a surface,or a hypersurface.Inspired by this brain science hypothesis of continuous attractor,during the mapping process for object information,a hypersausage measurement model with stronger nonlinear ability is introduced to form a more flexible hypersurface,so as to describe the mapping relationship between features and labels and improve the expression ability of the model for the target.In the experiment of MS COCO dataset,the object detection precision respectively achieves 25.8%,48.7%and 62.7%for small,middle and large objects.Extensive experiments conducted on the representative MS COCO dataset show the effectiveness of the proposed CIRA DETR model,especially for object occlusion and small object detection.
作者 宁欣 田伟娟 于丽娜 李卫军 NING Xin;TIAN Wei-Juan;YU Li-Na;LI Wei-Jun(Laboratory of Artificial Neural Networks and High-speed Circuits,Institute of Semiconductors,Chinese Academy of Sciences,Beijing 100083;Image Cognitive Computing Joint Lab,Wave Group,Beijing100083;School of Integrated Circuits,University of Chinese Academy of Sciences,Beijing 100049)
出处 《计算机学报》 EI CAS CSCD 北大核心 2022年第10期2080-2092,共13页 Chinese Journal of Computers
基金 国家自然科学基金(61901436)资助.
关键词 目标检测 DETR TRANSFORMER 胶囊网络 脑神经科学 残差网络 object detection DETR Transformer CapsNet neuroscience ResNet
  • 相关文献

参考文献2

二级参考文献5

共引文献36

同被引文献32

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部