摘要
针对跨模态行人重识别图像间模态差异大的问题,大多数现有方法采用像素对齐、特征对齐来实现图像间的匹配。为进一步提高两种模态图像间的匹配的精度,设计了一个基于动态双注意力机制的多输入双流网络模型。首先,在每个批次的训练中通过增加同一行人在不同相机下的图片,让神经网络在有限的样本中学习到充分的特征信息;其次,利用齐次增强得到灰度图像作为中间桥梁,在保留了可见光图像结构信息的同时消除了颜色信息,而灰度图像的运用弱化了网络对颜色信息的依赖,从而加强了网络模型挖掘结构信息的能力;最后,提出了适用于3个模态间图像的加权六向三元组排序(WSDR)损失,所提损失充分利用了不同视角下的跨模态三元组关系,优化了多个模态特征间的相对距离,并提高了对模态变化的鲁棒性。实验结果表明,在SYSU-MM01数据集上,与动态双注意聚合(DDAG)学习模型相比,所提模型在评价指标Rank-1和平均精确率均值(mAP)上分别提升了4.66和3.41个百分点。
Focused on the issue that huge modal difference between cross-modal person re-identification images, pixel alignment and feature alignment are commonly utilized by most of the existing methods to realize image matching. In order to further improve the accuracy of matching two modal images, a multi-input dual-stream network model based on dynamic dual-attention mechanism was designed. Firstly, the neural network was able to learn sufficient feature information in a limited number of samples by adding images of the same person taken by different cameras in each training batch. Secondly, the gray-scale image obtained by homogeneous augmentation was used as an intermediate bridge to retain the structural information of the visible light images and eliminate the color information at the same time. The use of gray-scale images weakened the network’s dependence on color information, thereby strengthening the network model’s ability to mine structural information. Finally, a Weighted Six-Directional triple Ranking(WSDR) loss suitable for images three modalities was proposed, which made full use of cross-modal triple relationship under different angles of view, optimized relative distance between multiple modal features and improved the robustness to modal changes. Experimental results on SYSU-MM01 dataset show that the proposed model increases evaluation indexes Rank-1 and mean Average Precision(mAP) by 4. 66 and 3. 41 percentage points respectively compared to Dynamic Dual-attentive AGgregation(DDAG) learning model.
作者
李大伟
曾智勇
LI Dawei;ZENG Zhiyong(College of Computer and Cyber Security,Fujian Normal University,Fuzhou Fujian 350117,China;Digital Fujian Institute of Big Data Security Technology,Fujian Normal University,Fuzhou Fujian 350117,China)
出处
《计算机应用》
CSCD
北大核心
2022年第10期3200-3208,共9页
journal of Computer Applications
关键词
跨模态
行人重识别
多输入双流网络
齐次增强
加权六向三元组排序损失
cross-modal
person re-identification
multi-input dual-stream network
homogeneous augmentation
Weighted Six-Directional triple Ranking(WSDR)loss