融合显著性图像语义特征的人体相似动作识别被引量：1

Human similar action recognition by fusing saliency image semantic features

导出

摘要目的基于骨骼的动作识别技术由于在光照变化、动态视角和复杂背景等情况下具有更强的鲁棒性而成为研究热点。利用骨骼/关节数据识别人体相似动作时,因动作间关节特征差异小,且缺少其他图像语义信息,易导致识别混乱。针对该问题,提出一种基于显著性图像特征强化的中心连接图卷积网络(saliency image feature enhancement based center-connected graph convolutional network,SIFE-CGCN)模型。方法首先,设计一种骨架中心连接拓扑结构,建立所有关节点到骨架中心的连接,以捕获相似动作中关节运动的细微差异;其次,利用高斯混合背景建模算法将每一帧图像与实时更新的背景模型对比,分割出动态图像区域并消除背景干扰作为显著性图像,通过预训练的VGG-Net(Visual Geometry Group network)提取特征图,并进行动作语义特征匹配分类;最后,设计一种融合算法利用分类结果对中心连接图卷积网络的识别结果强化修正,提高对相似动作的识别能力。此外,提出了一种基于骨架的动作相似度的计算方法,并建立一个相似动作数据集。结果实验在相似动作数据集与NTU RGB+D 60/120(Nanyang Technological University RGB+D 60/120)数据集上与其他方法进行比较。在相似动作数据集中,相比于次优模型识别准确率在跨参与者识别(X-Sub)和跨视角识别(X-View)基准分别提高4.6%和6.0%;在NTU RGB+D60数据集中,相比于次优模型识别准确率在X-Sub和X-View基准分别提高1.4%和0.6%;在NTU RGB+D 120数据集中,相比于次优模型识别准确率在X-Sub和跨设置识别(X-Set)基准分别提高1.7%和1.1%。此外,进行多种对比实验,验证了中心连接图卷积网络、显著性图像提取方法以及融合算法的有效性。结论提出的方法可以实现对相似动作的准确有效识别分类,且模型的整体识别性能及鲁棒性也得以提升。 Objective Human action recognition is a valuable research area in computer vision.It has a wide range of applications,such as security monitoring,intelligent monitoring,human-computer interaction,and virtual reality.The skeleton-based action recognition method first extracts the specific position coordinates of the major body joints from the video or image by using a hardware method or a software method.Then,the skeleton information is used for action recognition.In recent years,skeleton-based action recognition has received increasing attention because of its robustness in dynamic environments,complex backgrounds,and occlusion situations.Early action recognition methods usually use hand-crafted features for action recognition modeling.However,the hand-crafted feature methods have poor generalization because of the lack of diversity in the extracted features.Deep learning has become the mainstream action recognition method because of its powerful automatic feature extraction capabilities.Traditional deep learning methods use constructed skeleton data as joint coordinate vectors or pseudo-images,which are directly input into recurrent neural networks(RNNs)or convolutional neural networks(CNNs)for action classification.However,the RNN-based or CNN-based methods lose the spatial structure information of skeleton data because of the limitation set by the European data structure.Moreover,these methods cannot extract the natural correlation of human joints.Thus,distinguishing subtle differences between similar actions becomes difficult.Human joints are naturally structured as graph structures in non-Euclidean space.Several works have successfully adopted graph convolutional networks(GCNs)to achieve state-of-the-art performance for skeleton-based action recognition.In these methods,the subtle differences between the joints are not explicitly learned.These subtle differences are crucial to recognizing similar actions.Moreover,the skeleton data extracted from the video shield the object information that interacts with humans and only retain the primary joint coordinates.The lack of image semantics and the reliance only on joint sequences remarkably challenge the recognition of similar actions.Method Given the above factors,the saliency image feature enhancement based center-connected graph convolutional network(SIFE-CGCN)is proposed in this work for skeleton-based similar action recognition.The proposed model is based on GCN,which can fully utilize the spatial and temporal dependence information between human joints.First,the CGCN is proposed for skeleton-based similar action recognition.For the spatial dimension,a center-connection skeleton topology is designed to establish connections between all human joints and the skeleton center to capture the small difference in joint movements in similar actions.For the temporal dimension,each frame is associated with the previous and subsequent frames in the sequence.Therefore,the number of adjacent nodes in the frame is fixed at 2.The regular 1D convolution is used on the temporal dimension as the temporal graph convolution.A basic co-occurrence graph convolution unit includes a spatial graph convolution,a temporal graph convolution,and a dropout layer.For training stability,the residual connection is added for each unit.The proposed network is formed by stacking nine graph convolution basic units.The batch normalization(BN)layer is added before the beginning of the network to standardize the input data,and a global average pooling layer is added at the end to unify the feature dimensions.The dual-stream architecture is used for utilizing the joint and bone information of the skeleton data simultaneously to extract data features from multiple angles.Given the different roles of each joint in different actions,the attention map is added to focus on the main motion joints in action.Second,the saliency image in the video is selected using the Gaussian mixture background modeling method.Each image frame is compared with the real-time updated background model to segment the image area with considerable changes,and the background interference is eliminated.The effective extraction of semantic feature maps from saliency images is the key to distinguishing similar actions.The Visual Geometry Group network(VGG-Net)can effectively extract the spatial structure features of objects from images.In this work,the feature map is extracted through pre-trained VGG-Net,and the fully connected layer is used for feature matching.Finally,the feature map matching result is used to strengthen and revise the recognition result of CGCN and improve the recognition ability for similar actions.In addition,the similarity calculation method for skeleton sequences is proposed,and a similar action dataset is established in this work.Result The proposed model is compared with the state-of-the-art models on the proposed similar action dataset and Nanyang Technological University RGB+D(NTU RGB+D)60/120 dataset.The methods for comparison include CNN-based,RNN-based,and GCN-based models.On the cross-subject(X-Sub)and cross-view(X-View)benchmarks in the proposed similar action dataset,the recognition accuracy of the proposed model can reach 80.3%and 92.1%,which are 4.6%and 6.0%higher than the recognition accuracies of the suboptimal algorithm,respectively.The recognition accuracy of the proposed model on the X-Sub and X-View benchmarks in the NTU RGB+D 60 dataset can reach 91.7%and 96.9%.Compared with the suboptimal algorithm,the proposed model improves by 1.4%and 0.6%.Compared with the suboptimal model feedback graph convolutional network(FGCN),the proposed model improves the recognition accuracy by 1.7%and 1.1%on the X-Sub and cross-setup(X-Set)benchmarks in the NTU RGB+D 120 dataset,respectively.In addition,we conduct a series of comparative experiments to show clearly the effectiveness of the proposed CGCN,the saliency image extraction method,and the fusion algorithm.Con⁃clusion In this study,we propose a SIFE-CGCN to solve the recognition confusion when recognizing similar actions due to the ambiguity between the skeleton feature and the lack of image semantic information.The experimental results show that the proposed method can effectively recognize similar actions,and the overall recognition performance and robustness of the model are improved.

作者白忠玉丁其川徐红丽吴成东 Bai Zhongyu;Ding Qichuan;Xu Hongli;Wu Chengdong(Faculty of Robot Science and Engineering,Northeastern University,Shenyang 110819,China)

机构地区东北大学机器人科学与工程学院

出处《中国图象图形学报》 CSCD 北大核心 2023年第9期2872-2886,共15页 Journal of Image and Graphics

基金国家自然科学基金项目(61973065,61973063) 辽宁省科技厅联合开放基金机器人学国家重点实验室开放基金资助项目(2020-KF-12-02) 中央高校基本科研业务业务费专项基金项目(N2226002)。

关键词动作识别骨架序列相似动作图卷积网络(GCN) 图像显著性特征 action recognition skeleton sequence similar action graph convolutional network(GCN) image salient features

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1蔡念,陈世文,郭文婷,潘晴.融合高斯混合模型和小波变换的运动目标检测[J].中国图象图形学报,2011,16(9):1716-1721. 被引量：12
2成科扬,吴金霞,王文杉,荣兰,詹永照.融合时空图卷积的多人交互行为识别[J].中国图象图形学报,2021,26(7):1681-1691. 被引量：5
3冉宪宇,刘凯,李光,丁文文,陈斌.自适应骨骼中心的人体行为识别算法[J].中国图象图形学报,2018,23(4):519-525. 被引量：14

二级参考文献11

1Peng Suo, Wang Yanjiang. An improved adaptive background modeling algorithm based on Gaussian mixture model [ C ] // Proceedings of ICSP2008. Beijing: IEEE Press,2008 : 1426-1439.
2Power P W, Schoonees J A. Understanding background mixture models for foregrounds segmentation [ C ]//Proceedings of Image and Vision Computing, New Zealand : Auckland ,2002:267-271.
3Harville M, Gordon G, Woodfill J. Foreground segmentation using adaptive mixture models in color and depth [C ]//Proceedings of IEEE Workshop on Detection and Recognition of Events in Video. Vancouver, BC, Canada: USA : IEEE Press, 2001 : 3 - 11.
4Zhong J ,Sclaroff S. Segmenting foreground objects from a dynamic textured background via a robust Kalman filter [ C ] // Proceedings of International Conference on Computer Vision. Nice, France : IEEE Press,2003:44-50.
5Jabri S, Duric Z, Wechsler H. Detection and location people in video images using adaptive fusion of color and edge information [ C ] //Proceedings of International Conference on Pattern Recognition. Barcdona, Spain : IEEE Press ,2000,627-630.
6Ercan Ozyildiz, Nils Krahnstover, Rajeev Shanna. Adaptive texture and color segmentation for tracking moving objects [ J ]. Pattern Recognition ,2002,35 (10) :2013-2029.
7Li Liyuan, Leung K H Maylor. Integrating intensity and texture differences for robust change detection [ J ]. IEEE Trans. Image Processing,2002,11 (2) : 105 - 112.
8原春锋,王传旭,张祥光,刘云.光照突变环境下基于高斯混合模型和梯度信息的视频分割[J].中国图象图形学报,2007,12(11):2068-2072. 被引量：24
9刘鑫,刘辉,强振平,耿续涛.混合高斯模型和帧间差分相融合的自适应背景模型[J].中国图象图形学报,2008,13(4):729-734. 被引量：110
10陈世文,蔡念,唐孝艳.一种基于高斯混合模型的运动目标检测改进算法[J].现代电子技术,2010,33(2):125-127. 被引量：7

共引文献28

1穆春迪,谢剑斌,闫玮,刘通,李沛秦.面向动摄像机的高速运动目标检测[J].中国图象图形学报,2015,20(3):349-356. 被引量：4
2葛鹤银,孙建红,林楠,吴凡.融合小波变换及SIFT算法的去抖动运动目标检测[J].实验室研究与探索,2016,35(2):119-123. 被引量：4
3龙铭,文章,黄文艺,周建民,周继慧.滚动轴承故障程度评估的AR-GMM方法[J].机械科学与技术,2016,35(8):1183-1188. 被引量：6
4魏燕欣,范秀娟.基于GMM的人体运动姿态的追踪与识别[J].北京服装学院学报（自然科学版）,2018,38(2):43-51. 被引量：8
5朱凌飞,万旺根.基于骨架模型的人体行为分析[J].电子测量技术,2019,42(8):68-73.
6张忠子.基于改进粒子滤波跟踪算法的运动视频跟踪[J].现代电子技术,2019,42(15):59-62. 被引量：1
7王婧,谷林.一种优化动作特征表示的动作姿态评测模型[J].西安工程大学学报,2019,33(5):562-567. 被引量：5
8邹小武,盛蒙蒙,毛家发,盛伟国.一种用于人体行为识别的CNN-BLSTM模型[J].小型微型计算机系统,2019,40(11):2313-2317. 被引量：5
9窦雪婷,王硕,季鑫盛.基于改进DNN-LSTM算法的车辆前方行人行为识别方法[J].计算机测量与控制,2019,27(11):175-179. 被引量：1
10吴育武.适应变电站智能安全监控的运动目标检测及人脸快速识别方法[J].机电工程技术,2019,48(11):52-55. 被引量：4

同被引文献4

1Haiming Huang,Junhao Lin,Linyuan Wu,Bin Fang,Zhenkun Wen,Fuchun Sun.Machine Learning-Based Multi-Modal Information Perception for Soft Robotic Hands[J].Tsinghua Science and Technology,2020,25(2):255-269. 被引量：5
2钟秋波,郑彩明,朴松昊.时空域融合的骨架动作识别与交互研究[J].智能系统学报,2020,15(3):601-608. 被引量：6
3刘文龙,陈春雨.基于多特征融合及Transformer的人体跌倒动作检测算法[J].应用科技,2022,49(2):49-54. 被引量：4
4刘宝龙,周森,董建锋,谢满德,周胜利,郑天一,张三元,叶修梓,王勋.基于骨架的人体动作识别技术研究进展[J].计算机辅助设计与图形学学报,2023,35(9):1299-1322. 被引量：3

引证文献1

1王振宇,向泽锐,支锦亦,叶浩航,丁铁成.融合人体骨架和姿势信息特征的轻量级人体动作识别方法[J].应用科技,2024,51(2):135-144.

1安康.安康作品[J].摄影与摄像,2022(11):72-73.
2李颖,张洁,徐小凤.强化规范性安全流程护理管理在感染科医护人员管理中的应用[J].中国卫生产业,2023,20(7):111-114.
3田翠,张维娜,罗小娇,郑海芳,张强,金伟,刘霜.猪肉中磺胺类药物的检测[J].现代食品,2023,29(15):196-199.
4王生进,豆朝鹏,樊懿轩,李亚利.ReID2.0:从行人再识别走向人像态势计算[J].中国图象图形学报,2023,28(5):1326-1345.
5沈瑜,梁丽,王海龙,严源,刘广辉,宋婧.基于N-RGAN模型的红外与可见光图像融合[J].红外技术,2023,45(9):897-906.
6贺佳.水化疗法对强化CT造影剂排泄效果影响[J].中国城乡企业卫生,2023,38(6):110-112.
7梁正友,蔡俊民,孙宇,陈磊.结合残差动态图卷积与特征强化的点云分类[J].广西师范大学学报（自然科学版）,2023,41(5):37-48.
8冯晴,丁伟伟,李可凤,刘晓媛,赵谢军,钟立森.健康宣教强化法在老年食管静脉曲张内镜治疗中的应用效果研究[J].中华胃肠内镜电子杂志,2023,10(1):70-72.
9王海英,赵体英,王恒,蒋琼.瑞舒伐他汀强化疗法联合双抗疗法治疗短暂性脑缺血发作患者疗效分析[J].实用医院临床杂志,2023,20(4):137-141.
10张俊华,阚积香.短期应用胰岛素强化治疗对于2型糖尿病患者胰岛β细胞功能的影响[J].糖尿病新世界,2023,26(14):125-128. 被引量：3

中国图象图形学报

2023年第9期

浏览历史

内容加载中请稍等...

融合显著性图像语义特征的人体相似动作识别被引量：1

参考文献3

二级参考文献11

共引文献28

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

融合显著性图像语义特征的人体相似动作识别 被引量：1

参考文献3

二级参考文献11

共引文献28

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

融合显著性图像语义特征的人体相似动作识别被引量：1