摘要
公共区域暴力行为频繁发生,视频监控对维护公共安全具有重要意义。相比固定摄像头,无人机具有监控灵活性,然而航拍成像中无人机快速运动以及姿态、高度的变化,使得目标出现运动模糊、尺度变化大的问题,针对该问题,设计了一种融合注意力机制的时空图卷积网络AST-GCN(Attention Spatial-Temporal Graph Convolutional Networks),用于实现航拍视频暴力行为识别。该方法主要分为两步:利用关键帧检测网络完成初定位以及AST-GCN网络通过序列特征完成行为识别确认。首先,针对视频暴力行为定位,设计关键帧级联检测网络,实现基于人体姿态估计的暴力行为关键帧检测,初步判断暴力行为的发生时间。其次,在视频序列中提取关键帧前后的多帧人体骨架信息,对骨架数据进行归一化、筛选和补全,以提高对不同场景及部分关节点缺失的鲁棒性,并根据提取的骨架信息构建骨架时序-空间信息表达矩阵。最后,时空图卷积对多帧人体骨架信息进行分析识别,融合注意力模块,提升特征表达能力,完成暴力行为识别。在自建航拍暴力行为数据集上进行验证,实验结果表明,融合注意力机制的时空图卷积AST-GCN能实现航拍场景暴力行为识别,识别准确率达86.6%。提出的航拍暴力行为识别方法对于航拍视频监控和行为理解等应用具有重要的工程价值和科学意义。
The violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles(UAVs)have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and height cause the problem of motion blur and large-scale change of target.To solve this problem,an attention spatial-temporal convolutional network(AST-GCN)combining attention mechanism is designed to realize the identification of violent behavior in aerial video.The proposed method is divided into two steps:the key frame detection network completes the initial positioning,and the AST-GCN network completes the behavior identification through the sequence features.Firstly,aiming at video violence localization,a key frame cascade detection network is designed to realize violence key frame detection based on human posture estimation,and preliminarily judge the occurrence time of violence.Secondly,the skeleton information of multiple frames around key frames is extracted from the video sequence,and the skeleton data is pre-processed,including normalization,screening and completion,so as to improve the robustness of different scenes and the partial missing of key nodes.And the skeleton temporal-spatial representation matrix is constructed according to the extracted skeleton information.Finally,AST-GCN network analyzes and identifies multiple frames of human skeleton information,to integrate attention module,improve feature expression ability,and complete the recognition of violent behavior.The method is validated on self-built aerial violence data set,and experimental results show that the AST-GCN can realize the recognition of aerial scene violence,and the recognition accuracy is 86.6%.The proposed method has important engineering value and scientific signifi-cance for the realization of aerial video surveillance and human pose understanding applications.
作者
邵延华
李文峰
张晓强
楚红雨
饶云波
陈璐
SHAO Yan-hua;LI Wen-feng;ZHANG Xiao-qiang;CHU Hong-yu;RAO Yun-bo;CHEN Lu(School of Information,Southwest University of Science and Technology,Mianyang,Sichuan 621000,China;School of Information and Software Engineering,University of Electronic Science&Technology,Chengdu 610054,China)
出处
《计算机科学》
CSCD
北大核心
2022年第6期254-261,共8页
Computer Science
基金
国家自然科学基金(61601382)
四川省教育厅项目(17ZB0454)
西南科技大学博士基金(19zx7123)
西南科技大学龙山人才(18LZX632)。
关键词
暴力行为识别
人体姿态估计
航拍
时空图卷积
级联网络
注意力机制
Violence recognition
Human pose estimation
Aerial photography
Spatial-temporal graph convolutional
Cascade network
Attention mechanism