摘要
人体行为识别(human action recognition,HAR)是从无人机捕获视频中理解行人意图的一项关键技术。但无人机平台算力有限、现有行为识别方法效率较低。提出轻量级的空间分组注意力图卷积网络,降低网络深度,提升效率并保证行为识别精度。为了捕获能够代表全局运动的肢体部位,提出空间分组注意力模型,增强与全局特征相似性高的局部特征。此外,仅靠关节和骨架特征无法有效区分具有相似运动轨迹的行为,构建骨骼角度的高阶特征编码,捕获更能反映细微运动差异的肢体关节间角度的变化,提升特征的表示能力。最后,针对无人机航拍视频的低帧率问题,提出基于帧间差异的线性插帧方案,提升样本信息量。实验结果表明,与现有SOTA方法相比,该方法在UAV-Human数据集上的识别率、参数量、训练耗时、执行耗时都具有更好的性能。
Human action recognition is a key technology for understanding pedestrian intentions from video captured by unmanned aerial vehicles(UAV).However,UAV platforms have limited computing power,and existing action recognition methods are inefficient.A lightweight spatial grouping attention graph convolutional network(SGA-GCN)was proposed to reduce network depth to improve the efficiency and ensure the accuracy of action recognition.In order to capture body parts that represent global motion,spatial grouping attention was introduced to enhance local features with high similarity to global features.Moreover,since it was impossible to effectively distinguish actions with similar motion trajectories solely based on joint and skeletal features,a high-order feature encoding of skeletal angles was constructed to capture changes in angles between limb joints that better reflected subtle motion differences and improved feature representation capabilities.Finally,to address the low frame rate issue in UAV aerial video,a linear interpolation scheme based on inter-frame differences was proposed to increase sample information quantity.Experimental results demonstrate that compared to the existing state-of-the-art(SOTA)methods,the proposed approach achieves better performance in terms of recognition rate,parameter quantity,training time and execution time on the UAV-Human dataset.
作者
刘芳
黄盛
石祥滨
赵亮
LIU Fang;HUANG Sheng;SHI Xiangbin;ZHAO Liang(College of Computer Science,Shenyang Aerospace University,Shenyang 110136,China)
出处
《沈阳航空航天大学学报》
2024年第4期50-58,共9页
Journal of Shenyang Aerospace University
基金
国家自然科学基金(项目编号:61170185,62372310)。
关键词
无人机
空间分组
行为识别
高阶特征编码
线性插帧
unmanned aerial vehicle
spatial grouping
action recognition
high-order feature encoding
linear interpolation