摘要
挖掘热红外和可见光数据的互补信息,可以有效提升复杂环境下视觉跟踪的鲁棒性。然而大多数方法在特征提取过程中只是独立提取单模态特征,忽略了多层特征建模对准确定位目标位置的重要作用。针对上述问题,本文提出了基于多层注意力机制的RGBT目标跟踪方法。首先,将多模态图片对输入骨干网络以提取两种模态的深度特征,同时在各层特征提取中引入模态注意力模块,用于过滤不准确的多模态信息,有效实现多层次多模态特征建模。此外,为了抑制多模态融合特征中的噪声和冗余信息,本文提出了模态融合模块,并利用该模块进一步实现多模态特征的自适应融合,从而获得更具有判别性的多模态特征。在两个公开数据集上的实验表明,本文方法在RGBT目标跟踪任务上实现了高精度和快速跟踪。
Extracting the complementary information of infrared and visible light data can effectively improve the robustness of visual tracking in complex environments.However,in the process of feature extraction,most of these methods only extract single-modal features independently,ignoring the important role of multi-layer feature modeling in accurately locating the target position.Aiming at the above problems,this paper proposes an RGBT tracking based on a multi-layer attention mechanism.Firstly,the depth features of two modalities are extracted from the input backbone network of multi-modal images,and at the same time,the modality attention module is introduced into each layer of feature extraction to filter inaccurate multi-modal information,thus realizing effective multi-level and multi-modal feature modeling.In addition,to suppress the noise and redundant information in multi-modal fusion features,a modality fusion module is proposed to further realize the adaptive fusion of multi-modal features and obtain more discriminating multi-modal features.Experiments on two public datasets show that the proposed method generates higher tracking accuracy and speed.
作者
吴毅
翟素兰
刘磊
WU Yi;ZHAI Sulan;LIU Lei(School of Mathematical Sciences,Anhui University,Hefei 230601,China;Anhui Provincial Key Laboratory of Multi Modal Cognitive Computation Anhui University,Hefei 230601,China)
出处
《安庆师范大学学报(自然科学版)》
2024年第2期77-83,共7页
Journal of Anqing Normal University(Natural Science Edition)
基金
国家自然科学基金(62076003)
安徽大学数学学院开放课题(KF2019A03)。