基于长短期时间关系网络的视频行人重识别

Video-Based Person Re-Identification Using Long-Short Term Temporal Relationship Network

下载PDF

导出

摘要行人重识别是计算机视觉领域中的一个重要研究方向,其目的是在不同的监控摄像头中识别并跟踪同一行人.由于视频帧间存在多种时间关系,从这些关系中可以获取到对象的运动模式以及细粒度特征,因此视频重识别相比图像重识别拥有更丰富的时空线索,也更接近实际应用.问题的关键是如何挖掘这些时空线索作为视频重识别的特征.本文针对视频行人重识别问题,提出了一种基于Transformer的长短期时间关系网络(Long and Short Time Transformer,LSTT).该网络包含长短期时间关系模块,提取重要时序信息并强化特征表示.长期时间关系模块利用记忆线索存储每帧信息,并在每一帧建立全局联系;短期时间关系模块则考虑相邻帧之间交互,学习细粒度目标信息,提高特征表示能力.此外,为了提高模型对不同目标特征的适配性,本文还设计了一个包含不同规格卷积核的多尺度模块.该模块具有多种卷积感受野,能够更全面覆盖目标区域,从而进一步提高模型的泛化性能.在MARS、MARS_DL和iLIDS-VID 3个数据集上的实验结果表明,LSTT模型性能最优. Person re-identification is an important research direction in the field of computer vision,aiming to identi⁃fy and track the same person across different surveillance cameras.Compared with image-based re-identification methods,the video-based re-identification method has richer temporal and spatial information,making it more efficient in real-world applications.Due to the existence of various temporal relationships between video frames,valuable information such as mo⁃tion patterns and fine-grained features can be obtained.Therefore,how to effectively extract these temporal and spatial clues has become a key issue in video-based re-identification.In this paper,a long and short time Transformer(LSTT)network based on a temporal relationship is proposed to address the video-based person re-identification problem.The module in⁃cludes long and short term relationship modules to extract important temporal information and enhance feature representa⁃tion.The long-term relationship module stores information for each frame using a memory cue and establishes global con⁃nections for each video frame.The short-term relationship module considers interaction between adjacent frames to learn fine-grained target information and improve feature representation.Additionally,to improve the model’s adaptability to dif⁃ferent target features,a multi-scale module with convolution kernels of different sizes is designed.The module has multiple convolution receptive fields and can more comprehensively cover the target area,further improving the model’s generaliza⁃tion performance.Experimental results on three datasets,namely MARS,MARS_DL,and iLIDS-VID,demonstrate that the LSTT model achieves state-of-the-art performance.

作者何智敏钱江波严迪群叶绪伦王翀 HE Zhi-min;QIAN Jiang-bo;YAN Di-qun;YE Xu-lun;WANG Chong(Faculty of Electrical Engineering and Computer Science,Ningbo University,Ningbo,Zhejiang 315211,China;Zhejiang Key Laboratory of Mobile Network Application Technology,Ningbo,Zhejiang 315211,China)

机构地区宁波大学信息科学与工程学院浙江移动网络应用技术重点实验室

出处《电子学报》 EI CAS CSCD 北大核心 2024年第8期2746-2757,共12页 Acta Electronica Sinica

基金国家自然科学基金(No.62271274) 宁波市科技项目(No.2024Z004,No.2023Z059)。

关键词视频行人重识别 TRANSFORMER 长期时间关系短期时间关系多尺度 video-based person re-identification Transformer the long-term temporal relationship the short-term temporal relationship multi-scale module

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1周原冰,张士宁,侯方心,任宏涛,徐鹏飞.电力行业碳达峰及促进全社会碳减排影响分析[J].中国电力,2024,57(9):1-9.
2全球食品巨头玛氏深化零食业务[J].品质,2024(9):12-12.
3DING Zhaojing,LAI Zhongping,WANG Jiang.Progresses and future directions on yardangs[J].地球环境学报,2024,15(4):566-582.

电子学报

2024年第8期

浏览历史

内容加载中请稍等...

基于长短期时间关系网络的视频行人重识别

相关作者

相关机构

相关主题

浏览历史