摘要
传统的以ViT(Vision Transformer)模型为基准架构的关节点检测模型通常采用二维正弦位置编码,易丢失图像关键的二维形状信息,导致精度下降;而行为分类模型中,传统的时空图卷积网络(ST-GCN)在单标签分区策略中存在非物理连接的关节连接间关联度缺失问题。针对上述问题,设计一种轻量化实时跌倒检测算法框架,以快速准确地检测跌倒行为。该框架包含关节点检测模型RPEpose(Relative Position Encoding pose estimation)和行为分类模型XJ-GCN(Cross-Joint attention Graph Convolutional Network)。一方面,RPEpose模型采用相对位置编码克服原有位置编码的位置不敏感的缺陷,提升ViT架构在关节点检测中的性能;另一方面,提出X-Joint(Cross-Joint)注意力机制,将分区策略重构为XJL(X-Joint Labeling)分区策略后,对所有关节连接之间的依赖关系建模,能获得关节连接潜在相关性,具有分类性能优异且参数量小的优势。实验结果表明,在COCO 2017验证集上,对于分辨率为256×192的图像,RPEpose模型的计算开销仅为8.2 GFLOPs(Giga FLOating Point of operations),测试平均精度(AP)为74.3%;在以交叉目标(X-Sub)为划分标准的NTU RGB+D数据集上,XJ-GCN模型的测试Top-1准确率为89.6%,所提框架RPEpose+XJ-GCN的处理速度为30 frame/s,预测准确率为87.2%,具有较高的实时性和准确性。
The traditional joint keypoint detection model based on the Vision Transformer(ViT)model usually adopts 2D Sine Position Embedding,which is prone to losing key two-dimensional shape information in the image,leading to a decrease in accuracy.For behavior classification models,the traditional Spatio-Temporal Graph Convolutional Network(ST-GCN)suffers from the lack of correlation between non-physically connected joint connections in uni-labeling partitioning strategy.To address the above problems,a lightweight real-time fall detection algorithm framework was designed to detect fall behavior quickly and accurately.The framework contains a joint keypoint detection model RPEpose(Relative Position Encoding pose estimation)and a behavior classification model XJ-GCN(Cross-Joint attention Graph Convolutional Network).On the one hand,a type of relative position encoding was adopted by the RPEpose model to overcome the position insensitivity defect of the original position encoding and improve the performance of the ViT architecture in joint keypoint detection.On the other hand,an X-Joint(Cross-Joint)attention mechanism was proposed,after reconstructing the partitioning strategy into the XJL(X-Joint Labeling)partitioning strategy,the dependencies between all joint connections were modelled to obtain the potential correlation between joint connections with excellent classification performance and few parameters.Experimental results indicate that,on the COCO 2017 validation set,RPEpose model only requires 8.2 GFLOPs(Giga FLOating Point of operations)of computational overhead while achieving a testing Average Precision(AP)of 74.3%for images with a resolution of 256×192;on the NTU RGB+D dataset,the Top-1 accuracy using Cross Subject(X-Sub)as the partitioning standard is 89.6%,and the proposed framework RPEpose+XJ-GCN has a prediction accuracy of 87.2%at a processing speed of 30 frame/s,verifying its high real-time and accuracy.
作者
梁睿衍
杨慧
LIANG Ruiyan;YANG Hui(School of Information Science and Technology,Southwest Jiaotong University,Chengdu Sichuan 610031,China)
出处
《计算机应用》
CSCD
北大核心
2024年第11期3639-3646,共8页
journal of Computer Applications
基金
区域光纤通信网与新型光通信系统国家重点实验室开放基金资助项目。
关键词
跌倒检测
关节点检测
相对位置编码
时空图卷积网络
注意力机制
fall detection
joint keypoint detection
relative position encoding
Spatio-Temporal Graph Convolutional Network(ST-GCN)
attention mechanism