长短期时间序列关联的视频异常事件检测

Video anomaly detection with long-and-short-term time series correlations

导出

摘要目的多示例学习是解决弱监督视频异常事件检测问题的有力工具。异常事件发生往往具有稀疏性、突发性以及局部连续性等特点,然而,目前的多示例学习方法没有充分考虑示例之间的联系,忽略了视频片段之间的时间关联,无法充分分离正常片段和异常片段。针对这一问题,提出了一种长短期时间序列关联的二阶段异常检测网络。方法第1阶段是长短期时间序列关联的异常检测网络(long-and-short-term correlated mil abnormal detection framework,LSC-transMIL),将Transformer结构应用到多示例学习方法中,添加局部和全局时间注意力机制,在学习不同视频片段间的空间关联语义信息的同时强化连续视频片段的时间序列关联;第2阶段构建了一个基于时空注意力机制的异常检测网络,将第1阶段生成的异常分数作为细粒度伪标签,使用伪标签训练策略训练异常事件检测网络,并微调骨干网络,提高异常事件检测网络的自适应性。结果实验在两个大型公开数据集上与同类方法比较,两阶段的异常检测模型在UCF-crime、ShanghaiTech数据集上曲线下面积(area under curve,AUC)分别达到82.88%和96.34%,相比同为两阶段的方法分别提高了1.58%和0.58%。消融实验表明了关注时间序列的Transformer模块以及长短期注意力的有效性。结论本文将Transformer应用于时间序列的多示例学习,并添加长短期注意力,突出局部异常事件和正常事件的区别,有效检测视频中的异常事件。 Objective Video anomaly detection has been applied in many fields such as manufacturing,traffic management and security monitoring.However,detailed annotation of video data is labor intensive and cumbersome.Consequently,many researchers have started to employ weakly supervised learning methods to address this issue.Unlike the supervised learning method,the weakly supervised learning only requires video-level labels in the training stage,which greatly reduces the workload of dataset labeling,and only frame-level labeling information is required for the test dataset.Multiple instance learning(MIL) has been recognized as a powerful tool for addressing weakly supervised video abnormal event detection.Abnormal behavior in video is highly correlated with video context information.The traditional MIL method uses convolutional 3D network to extract video features,uses the ordering loss function,and introduces sparsity and time smoothing constraints into the ordering loss function to integrate time information into the ordering model.Introducing time concern only into the loss function is not enough.The use of temporal convolutional network to extract video context information further enhances the effect of video anomaly detection network.However,this global introduction of time information cannot sufficiently separate abnormal video clips from normal video clips.Therefore,the attention MIL builds timeenhancing networks to learn motion features while using the attention mechanism to incorporate temporal information into the ranking model.The learned attention weights can help better distinguish between abnormal and normal video clips.The spatiotemporal fusion graph network constructs spatial similarity graphs and temporal continuity graphs separately for video segments,which are then fused to generate a spatiotemporal fusion graph.This approach strengthens the spatiotemporal correlations among video segments,ultimately enhancing the accuracy of abnormal behavior detection.Multiple instance self-training framework uses pseudo-label training,which is an effective training strategy to improve model quality in weakly supervised learning.It constructs a two-stage training network and uses the pseudo-label trained by the first-stage MIL to guide the training of the second-stage self-guided attention feature extractor,providing a general idea to improve model quality.However,these approaches do not fully exploit temporal correlations,as the feature representation of the instances lacks fusion with neighboring and global features.Abnormal events often exhibit characteristics such as sparsity,suddenness,and local continuity,and the insufficient temporal correlations between video segments result in an inadequate separation between normal and abnormal segments.To address this issue,this paper proposes a two-stage abnormal detection network with long-and-short-term time series association.Method The first stage involves a long-and-short-term time series association abnormal detection network(LSC-transMIL) that applies the Transformer structure to MIL methods.It consists of two layers,each containing a local temporal sequence correlation attention module and a global instance correlation attention module.The former learns information in the temporal dimension between individual instances and neighboring instances,while the latter focuses on the association between individual instances and global information.Combining local and global attention mechanisms makes it possible to establish meaningful information correlations among instances,highlighting the distinctions between local and global features in the video.This approach makes it easier to distinguish abnormal video segments from normal ones.This module generates new instance features,which are then fed into the ranking model to generate video abnormal scores and pseudo-labels.In the second stage,a spatiotemporal attention mechanism-based abnormal detection network is constructed.The SlowFast backbone network is employed to extract video features,and the slow and fast pathway features are weighted and fused using spatiotemporal attention.The slow branch pays attention to the spatiotemporal information of the video frame using the spatiotemporal attention module,while the fast branch guides the attention to the temporal information through the time-dimensional attention module,and then the two branch features are spliced to obtain the final video features.The abnormal scores generated in the first stage are used as fine-grained pseudo-labels to train the abnormal event detection network by using a pseudo-labeling strategy.Furthermore,the backbone network is fine-tuned to enhance the adaptive capability of the abnormal event detection network.Result Extensive experiments were conducted on two large-scale public datasets(UCF-crime and ShanghaiTech) to compare the proposed two-stage abnormal detection model with similar methods.The two-stage model achieved area under the curve scores of 82.88% and 96.34% on the UCF-crime and ShanghaiTech datasets,respectively,demonstrating an improvement of 1.58% and 0.58% compared with other two-stage methods.Sufficient ablation experiments were conducted on the two datasets,and the effects of the proposed LSC-transMIL,traditional MIL method,and attention MIL method were compared under three backbone networks,proving the effectiveness of LSC-transMIL.Qualitative and quantitative explanations are given for the ablation experiments of global attention and global local attention,and the effectiveness of combining local and global attention is proved.The role of local and global time correlation is visualized using heat maps.Conclusion This paper applies the Transformer to time series-based MIL and introduces long-and-short-term attention to highlight the differences between local abnormal events and normal events.The proposed two-stage abnormal detection network utilizes the abnormal scores generated in the first stage as pseudo-labels,trains a network based on the SlowFast backbone network and spatiotemporal attention modules,and fine-tunes the backbone network to enhance the adaptive capability of the abnormal detection network.The proposed approach effectively improves the accuracy of abnormal event detection.

作者朱新瑞钱小燕施俞洲陶旭东李智昱 Zhu Xinrui;Qian Xiaoyan;Shi Yuzhou;Tao Xudong;Li Zhiyu(College of Civil Aviation,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)

机构地区南京航空航天大学民航学院

出处《中国图象图形学报》 CSCD 北大核心 2024年第7期1998-2010,共13页 Journal of Image and Graphics

基金国家自然科学基金项目(61803199,U2033201)。

关键词异常检测 Transformer网络时空注意力多示例学习(MIL) 弱监督 anomaly detection Transformer spatio-temporal attention multiple instance learning(MIL) weakly super-vised

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1周航,詹永照,毛启容.基于时空融合图网络学习的视频异常事件检测[J].计算机研究与发展,2021,58(1):48-59. 被引量：10
2王志国,章毓晋.监控视频异常检测:综述[J].清华大学学报（自然科学版）,2020,60(6):518-529. 被引量：23
3梁家菲,李婷,杨佳琪,李亚楠,方智文,杨丰.融合自注意力和自编码器的视频异常检测[J].中国图象图形学报,2023,28(4):1029-1040. 被引量：6

二级参考文献6

1李娟,张冰怡,冯志勇,徐超,张铮.基于隐马尔可夫模型的视频异常场景检测[J].计算机工程与科学,2017,39(7):1300-1308. 被引量：6
2黄鑫,肖世德,宋波.监控视频中的车辆异常行为检测[J].计算机系统应用,2018,27(2):125-131. 被引量：12
3冯宁,郭晟楠,宋超,朱琪超,万怀宇.面向交通流量预测的多组件时空图卷积网络[J].软件学报,2019,30(3):759-769. 被引量：65
4胡正平,张乐,尹艳华.时空深度特征AP聚类的稀疏表示视频异常检测算法[J].信号处理,2019,35(3):386-395. 被引量：11
5许晶航,左万利,梁世宁,王英.基于图注意力网络的因果关系抽取[J].计算机研究与发展,2020,57(1):159-174. 被引量：21
6徐冰冰,岑科廷,黄俊杰,沈华伟,程学旗.图卷积神经网络综述[J].计算机学报,2020,43(5):755-780. 被引量：270

共引文献35

1彭月平,蒋镕圻,徐蕾.基于C3D-GRNN模型的人群异常行为识别算法[J].测控技术,2020,39(7):44-50. 被引量：5
2朱云,凌志刚,张雨强.机器视觉技术研究进展及展望[J].图学学报,2020,41(6):871-890. 被引量：105
3彭嘉丽,赵英亮,王黎明.基于深度学习的视频异常行为检测研究[J].激光与光电子学进展,2021,58(6):43-53. 被引量：15
4杨帆,肖斌,於志文.监控视频的异常检测与建模综述[J].计算机研究与发展,2021,58(12):2708-2723. 被引量：6
5邬开俊,黄涛,王迪聪,白晨帅,陶小苗.视频异常检测技术研究进展[J].计算机科学与探索,2022,16(3):529-540. 被引量：9
6孙奇,吉根林,张杰.基于非局部注意力生成对抗网络的视频异常事件检测方法[J].计算机科学,2022,49(8):172-177. 被引量：4
7何平,李刚,李慧斌.基于深度学习的视频异常检测方法综述[J].计算机工程与科学,2022,44(9):1620-1629. 被引量：9
8古平,邱嘉涛,罗长江,张志鹏.基于目标时空上下文融合的视频异常检测算法[J].计算机工程,2022,48(10):169-175. 被引量：1
9张梓婷,韩金玉,张东辉,李晗,李铭源,邓志平,孙晓勇.基于颜色矩的土豆、玉米、苹果叶片病害异常检测[J].浙江农业学报,2022,34(10):2230-2239. 被引量：9
10孙晋永,周博文,闻立杰,许乾,邓文伟,孙志刚.基于注意力机制的业务过程异常检测方法[J].计算机集成制造系统,2022,28(10):3039-3051. 被引量：5

1孟庆坤.重组教学内容优化数学课堂教学[J].山东教育,2024(17):59-61.
2Hao Li,Qing Zhao,Long Shao,Tao Liu,Chenzhou Cui,Yunfei Xu.Real-time Abnormal Detection of GWAC Light Curve based on Wavelet Transform Combined with GRU-Attention[J].Research in Astronomy and Astrophysics,2024,24(5):151-168.
3窦嘉欣.服装设计中的曲面造型应用研究[J].纺织报告,2024,43(3):72-75.
4李永慧.基于注意力机制的多任务目标计数系统设计[J].电视技术,2024,48(7):47-52.
5刘源.足球训练中的技术与策略:提高球员表现的关键因素[J].冰雪体育创新研究,2024(12):171-173.
6王玉仲,王斌,张华.右手示指甲下纤维黏液瘤一例[J].中华手外科杂志,2024,40(1):62-63.
7Haoyu GUO,Shaoping WANG,Jian SHI,Tengfei MA,Giorgio GUGLIERI,Rujun JIA,Fausto LIZZIO.Dynamically updated digital twin for prognostics and health management:Application in permanent magnet synchronous motor[J].Chinese Journal of Aeronautics,2024,37(6):244-261.
8张启辰,王帅,李静梅.一种基于窗口机制的口语理解异构图网络[J].软件学报,2024,35(4):1885-1898.
9王雪静,张南,刘雪姣,陈为军.胫骨造釉细胞瘤1例[J].医学影像学杂志,2024,34(6):174-175.
10左成淳,叶晓芬,李晓宇.倍氯米松福莫特罗吸入气雾剂致老年患者舌部烧灼感1例分析[J].上海医药,2024,45(13):66-68.

中国图象图形学报

2024年第7期

浏览历史

内容加载中请稍等...

长短期时间序列关联的视频异常事件检测

参考文献3

二级参考文献6

共引文献35

相关作者

相关机构

相关主题

浏览历史