视频中多特征融合人体姿态跟踪被引量：5

Human pose tracking based on multi-feature fusion in videos

导出

摘要目的目前已有的人体姿态跟踪算法的跟踪精度仍有待提高,特别是对灵活运动的手臂部位的跟踪。为提高人体姿态的跟踪精度,本文首次提出一种将视觉时空信息与深度学习网络相结合的人体姿态跟踪方法。方法在人体姿态跟踪过程中,利用视频时间信息计算出人体目标区域的运动信息,使用运动信息对人体部位姿态模型在帧间传递;考虑到基于图像空间特征的方法对形态较为固定的人体部位如躯干和头部能够较好地检测,而对手臂的检测效果较差,构造并训练一种轻量级的深度学习网络,用于生成人体手臂部位的附加候选样本;利用深度学习网络生成手臂特征一致性概率图,与视频空间信息结合计算得到最优部位姿态,并将各部位重组为完整人体姿态跟踪结果。结果使用两个具有挑战性的人体姿态跟踪数据集Video Pose2.0和You Tube Pose对本文算法进行验证,得到的手臂关节点平均跟踪精度分别为81.4%和84.5%,与现有方法相比有明显提高;此外,通过在VideoPose2.0数据集上的实验,验证了本文提出的对下臂附加采样的算法和手臂特征一致性计算的算法能够有效提高人体姿态关节点的跟踪精度。结论提出的结合时空信息与深度学习网络的人体姿态跟踪方法能够有效提高人体姿态跟踪的精度,特别是对灵活运动的人体姿态下臂关节点的跟踪精度有显著提高。 Objective Human pose tracking in video sequences aims to estimate the pose of a certain person in each frame using image and video cues and consecutively track the human pose throughout the entire video.This field has been increasingly investigated because the development of artificial intelligence and the Internet of Things makes human-computer interaction frequent.Robots or intelligent agents would understand human action and intention by visually tracking human poses.At present,researchers frequently use pictorial structure model to express human poses and use inference methods for tracking.However,the tracking accuracy of current human pose tracking methods needs to be improved,especially for flexible moving arm parts.Although different types of features describe different types of information,the crucial point of human pose tracking depends on utilizing and combining appropriate features.We investigate the construction of effective features to accurately describe the poses of different body parts and propose a method that combines video spatial and temporal features and deep learning features to improve the accuracy of human pose tracking.This paper presents a novel human pose tracking method that effectively uses various video information to optimize human pose tracking in video sequences.Method An evaluation criterion should be used to track a visual target.Human pose is an articulated complex visual target,and evaluating it as a whole leads to ambiguity.In this case,this paper proposes a decomposable human pose expression model that can track each part of human body separately during the video and recombine parts into an entire body pose in each single image.Human pose is expressed as a principal component analysis model of trained contour shape similar to a puppet,and each human part pose contour can be calculated using key points and model parameters.As human pose unpredictably changes,tracking while detecting would improve the human pose tracking accuracy,which is different from traditional visual tracking tasks.During tracking,the video temporal information in the region of each human part target is used to calculate the motion information of each human part pose,and then the motion information is used to propagate the human part contour from each frame to the next.The propagated human parts are treated as human body part candidates in the current frame for subsequent calculation.During propagation,the background motion would disturb and pollute the foreground target motion information,resulting in the deviations of human part candidates obtained through propagation using motion information.To avoid the influence of propagated human part pose deviation,a pictorial structure-based method is adopted to generate additional human body pose candidates and are then decomposed into human body part poses for body part tracking and optimization.The pictorial structure-based method can detect relatively fixed body parts,such as trunk and head,whereas the detection effect of arms is poor because arms move flexibly and their shapes substantially and frequently change.In this circumstance,the problem of arm detection should be solved.A lightweight deep learning network is constructed and trained to generate probability graphs for the key points of human lower arms to solve this problem.Sampling from the generated probability graphs can obtain additional candidates of human lower arm poses.The propagated and generated human part pose candidates need to evaluated.The proposed evaluation method considers image spatial information and deep learning knowledge.Spatial information includes color and contour likelihoods,where the color likelihood function ensures the consistency of part color during tracking,and the contour likelihood function ensures the consistency of human part model contour with image contour feature.The proposed deep learning network can generate probability maps of lower arm feature consistency for each side to reveal the image feature consistency for each calculated lower arm candidates.The spatial and deep learning features work together to evaluate and optimize the part poses for each human part,and the optimized parts are recombined into integrated human pose,where the negative recombined human body poses are eliminated by the shape constraints of the proposed decomposable human model.The recombined optimized human entire pose is the human pose tracking result for the current video frame and is decomposed and propagated to the next frame for subsequent human pose tracking.Result Two publicly available challenging human pose tracking datasets,namely,VideoPose2.0 and You Tube Pose datasets,are used to verify the proposed human pose tracking method.For the Video Pose2.0 dataset,the key point accuracy of human pose tracking for shoulders,elbows,and wrists are 90.5%,82.6%,and 71.2%,respectively,and the average key point accuracy is 81.4%.The results are higher than state-of-the-art methods,such as the method based on conditional random field model(higher by 15.3%),the method based on tree structure reasoning model(higher by 3.9%),and the method based on max-margin Markova model(higher by 8.8%).For the YouTube Pose dataset,the key point accuracy of human pose tracking for shoulders,elbows,and wrists are 86.2%,84.8%,and 81.6%,respectively,and the average key point accuracy is 84.5%.The results are higher than state-of-the-art methods,such as the method based on flowing convent model(higher by 13.7%),the method based on dependent pairwise relation model(higher by 1.1%),and the method based on mixed part sequence reasoning model(higher by 15.9%).The proposed crucial algorithms of additional sampling and feature consistency of lower arm are verified on the VideoPose2.0 dataset,thereby effectively improving the tracking accuracy of lower arm joints by 5.2%and 31.2%,respectively.Conclusion Experimental results show that the proposed human pose tracking method that uses spatial-temporal cues coupled with deep learning probability maps can effectively improve the pose tracking accuracy,especially for the flexible moving lower arms.

作者马淼李贻斌武宪青高金凤潘海鹏 Ma Miao;Li Yibin;Wu Xianqing;Gao Jinfeng;Pan Haipeng(Zhejiang Sci-Tech University,Hangzhou 310018,China;Shandong University,Jinan 250100,China)

机构地区浙江理工大学山东大学

出处《中国图象图形学报》 CSCD 北大核心 2020年第7期1459-1472,共14页 Journal of Image and Graphics

基金国家自然科学基金项目(61803339,61673245) 浙江省自然科学基金项目(LQ19F030014,LQ18F030011) 浙江理工大学青年创新专项(2019Q035)。

关键词人体姿态跟踪视觉目标跟踪人机交互深度学习网络关节点概率图 human pose tracking visual target tracking human-computer interaction deep learning network probability map for joints

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1许允喜,陈方.局部图像描述符最新研究进展[J].中国图象图形学报,2015,20(9):1133-1150. 被引量：18

二级参考文献133

1Furukawa Y, Ponce J. Accurate, dense, and robust multiview stereopsis[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2010, 32(8):1362-1376.
2Agarwal S, Snavely N, Simon I, et al. Building rome on a day[C]//Proceedings of 12th International Conference on Computer Vision. Kyoto: IEEE, 2009: 368-381.
3Zhang G F, Jia J Y, Wong T T, et al. Consistent depth maps recovery from a video sequence[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2009, 31(6):974-988.
4Nister D, Stewenius H. Scalable recognition with a vocabulary tree[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2006:2161-2168.
5Wang J, Yang J, Yu K, et al. Locality constrained linear coding for image classification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA: IEEE, 2010: 3360-3367.
6Zhang J G, Marszalek M, Lazebnik S, et al. Local features and kernels for classification of texture and object categories: a comprehensive study [J]. International Journal of Computer Vision, 2007, 73(2):213-238.
7Lazebnik S, Schmid C, Ponce J. A sparse texture representation using local affine regions [J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2005, 27(8):1265-1278.
8Lowe D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
9Jegou H, Douze M, Schmid C. Improving bag-of-features for large scale image search [J]. International Journal of Computer Vision, 2010, 87(3):316-336.
10Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search [J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2011, 33(1):117-128.

共引文献17

1郭继昌,王楠,张帆.基于多描述子分层特征学习的图像分类[J].哈尔滨工业大学学报,2016,48(11):83-89. 被引量：3
2胡敬双,聂洪玉.灰度序模式的局部特征描述算法[J].中国图象图形学报,2017,22(6):824-832. 被引量：2
3许允喜,陈方.最大边缘方向模式直方图[J].中国图象图形学报,2017,22(6):852-862.
4李丽萍,赵传荣,孔德仁,王芳.基于图论的无监督区域遥感图像检索算法研究[J].计算机科学,2017,44(7):315-317. 被引量：1
5范大昭,董杨,张永生.卫星影像匹配的深度卷积神经网络方法[J].测绘学报,2018,47(6):844-853. 被引量：17
6都布,岳雅雯.基于C-SIFT特征向量图像复制粘贴篡改取证算法[J].电子测量技术,2019,42(5):29-33. 被引量：1
7唐家琳.基于深度学习的局部描述符[J].电子制作,2019,27(2):34-36.
8李颖桃,续欣莹,谢珺,刘建霞.一种基于邻域粗糙集特征选择的图像分类方法[J].现代电子技术,2019,42(8):89-93. 被引量：2
9贾迪,朱宁丹,杨宁华,吴思,李玉秀,赵明远.图像匹配方法研究综述[J].中国图象图形学报,2019,24(5):677-699. 被引量：91
10霍门婕,范大昭,董杨.异源多分辨率卫星影像匹配算法研究[J].测绘通报,2019(6):47-51.

同被引文献35

1胡雅琴.动作识别技术及其发展[J].电视技术,2013,37(S2):451-453. 被引量：2
2王耀文,于海勋,姚博,沈瑞.基于Canny检测算法实现的目标跟踪[J].电子设计工程,2012,20(3):149-152. 被引量：2
3姚放吾,许辰铭.基于目标质心的Meanshift跟踪算法[J].计算机技术与发展,2012,22(6):104-106. 被引量：13
4赵俊梅,张利平.交通视频中运动车辆检测和跟踪技术的研究[J].车辆与动力技术,2012(4):46-49. 被引量：2
5陈聪,朱煜,肖玉玲,陈宁.一种有效的车辆跟踪算法与异常车辆检测[J].华东理工大学学报（自然科学版）,2015,41(2):205-209. 被引量：7
6谭学芹,董超俊.一种基于视频图像序列的车辆跟踪算法[J].现代工业经济和信息化,2016,6(1):80-81. 被引量：1
7施巍松,孙辉,曹杰,张权,刘伟.边缘计算:万物互联时代新型计算模型[J].计算机研究与发展,2017,54(5):907-924. 被引量：486
8郭新新,崔爱军,万洪林,李天平.基于中值滤波和多特征融合的粒子滤波车辆跟踪算法研究[J].山东师范大学学报（自然科学版）,2017,32(3):69-75. 被引量：5
9卢湖川,李佩霞,王栋.目标跟踪算法综述[J].模式识别与人工智能,2018,31(1):61-76. 被引量：155
10朱浩楠,许明敏,沈瑛.基于Mean Shift的多视频车辆跟踪研究[J].计算机科学,2018,45(B06):220-226. 被引量：6

引证文献5

1周燕,刘紫琴,曾凡智,周月霞,陈嘉文,罗粤.深度学习的二维人体姿态估计综述[J].计算机科学与探索,2021,15(4):641-657. 被引量：21
2叶阳,卢奇,程时伟.基于质心法的车联网目标跟踪方法与应用[J].计算机科学,2021,48(S02):340-344. 被引量：2
3顾明琨,钟小勇.响应置信度的多特征融合核相关滤波跟踪算法[J].计算机测量与控制,2022,30(5):191-196.
4钟宝荣,吴夏灵.基于高分辨率网络的轻量型人体姿态估计研究[J].计算机工程,2023,49(4):226-232. 被引量：1
5赵建洗,景海彬,程磊.基于三维卷积神经网络的动作识别算法[J].科技与创新,2023(17):6-10. 被引量：1

二级引证文献25

1马双双,王佳,曹少中,杨树林,赵伟,张寒.基于深度学习的二维人体姿态估计算法综述[J].计算机系统应用,2022,31(10):36-43. 被引量：7
2马皖宜,张德平.基于多尺度双注意力的人体姿态估计方法研究[J].计算机科学,2022,49(S02):399-403. 被引量：1
3王新,杨秀梅.基于YOLOv5s和改进质心跟踪的人员跌倒检测[J].电子测量技术,2023,46(24):172-178. 被引量：1
4孔令军,赵子昂,刘伟光,周耀威.基于自学习特征金字塔网络的人体关键点检测算法[J].金陵科技学院学报,2021,37(3):13-21. 被引量：1
5孔令军,刘伟光,周耀威,裴会增,沈馨怡,赵子昂.一种基于深度可分离卷积的轻量级人体关键点检测算法[J].无线电工程,2022,52(1):76-82. 被引量：8
6赵珍珍,董彦如,曹慧,曹斌.老年人跌倒检测算法的研究现状[J].计算机工程与应用,2022,58(5):50-65. 被引量：9
7邬春学,贺欣欣.基于ResNet50对地震救援中人体姿态估计的研究[J].信息技术与网络安全,2022,41(3):50-58.
8邬春学,贺欣欣.基于改进匈牙利算法对多人人体关键点匹配的研究[J].信息技术与网络安全,2022,41(5):45-50. 被引量：2
9陈桂荣,邱仲禹,粟涛,陈弟虎.基于沙漏网络的多层次协同搜索方法[J].计算机应用研究,2022,39(8):2284-2289.
10马皖宜,张德平.基于多谱注意力高分辨率网络的人体姿态估计[J].计算机辅助设计与图形学学报,2022,34(8):1283-1292. 被引量：4

1何文君,孙敏,王俭勤.免疫抑制剂治疗IgA肾病疗效与安全性的网状Meta分析[J].医学综述,2020,26(13):2681-2692. 被引量：3
2孙颖.互联网对财务会计工作的影响探讨[J].管理观察,2020(19):165-166.
3朱文敏,魏小龙.非药物干预方法对乳腺癌患者康复效果的网状Meta分析[J].中华护理杂志,2020,55(6):862-867. 被引量：14
4朱伟杰,朱洪军,伍祥,吴锦华,刘晴晴.基于相关滤波技术的目标跟踪方法综述[J].信息工程大学学报,2019,20(6):684-688. 被引量：1
5王锴.人脸识别在治理领域的应用与规制[J].国家治理,2020(20):24-25. 被引量：1
6刘鹏林,朱庆艳.登山运动中女子肩臂部位体表尺寸变化规律[J].纺织导报,2020(5):81-84. 被引量：1
7李润泽,姚红英,李保平,李超.基于视频时空上下文的单目标视觉跟踪算法[J].通信与信息技术,2020(3):54-57.
8罗朝莲.不同采血方式究竟有啥区别[J].大健康,2020,0(14):0095-0095.
9夏进军,周方舟.基于深度学习的汽车设计造型要素研究[J].时代汽车,2020(12):98-101. 被引量：2
10周维,陈听海,邱宝鑫.引入特征重检的抗遮挡目标跟踪方法研究[J].计算机工程与应用,2020,56(11):179-184. 被引量：3

中国图象图形学报

2020年第7期

浏览历史

内容加载中请稍等...

视频中多特征融合人体姿态跟踪被引量：5

参考文献1

二级参考文献133

共引文献17

同被引文献35

引证文献5

二级引证文献25

相关作者

相关机构

相关主题

浏览历史

视频中多特征融合人体姿态跟踪 被引量：5

参考文献1

二级参考文献133

共引文献17

同被引文献35

引证文献5

二级引证文献25

相关作者

相关机构

相关主题

浏览历史

视频中多特征融合人体姿态跟踪被引量：5