期刊文献+

多模态时空特征表示及其在行为识别中的应用 被引量:2

Multimodal spatial-temporal feature representation and its application in action recognition
原文传递
导出
摘要 目的在人体行为识别研究中,利用多模态方法将深度数据与骨骼数据相融合,可有效提高动作的识别率。针对深度图像信息数据量大、冗余度高等问题,提出一种通过获取关键时程信息动作帧序列降低冗余的算法,即质心运动路径松弛算法,并根据不同模态数据的特点,提出一种新的时空特征表示方法。方法质心运动路径松弛算法根据质心在相邻帧之间的运动距离,计算图像差分后获得的活跃部分的相似系数,然后剔除掉相似度高的帧,获得足以表达行为的关键时程信息。根据图像动态部分的变化特性、人体各部分在运动中的协同性和局部显著性特征构建一种新的时空特征表示方法。结果在MSR-Action3D数据集上对本文方法的效果进行验证。在3个子集中进行交叉验证的平均分类识别率为95.7432%,分别比Multi-fused,CovP3DJ,D3D-LSTM(densely connected 3DCNN and long short-term memory),Joint Subset Selection方法高2.4432%,4.7632%,0.3432%,0.2132%。本文方法在使用完整数据集的扩展实验中进行交叉验证的分类识别率为93.0403%,具有很好的鲁棒性。结论实验结果表明,本文提出的去冗余算法在降低冗余后提升了识别效果,提取的特征之间具有相关性低的特点,在组合识别中具有良好的互补性,有效提高了分类识别的精确度。 Objective Human body motion-related recognition has been developing in the context of computer vision and pattern recognition like auxiliary human-computer interaction,motion analysis,intelligent monitoring,and virtual reality.To obtain two-dimensional information for its behavioral recognition,conventional motion behavior recognition is mainly used the RGB image sequence captured by RGB camera.To improve the ability to detect short-duration fragments,current feature descriptors for RGB image sequences are employed to characterize human behavior,such as histogram of oriented gradient(HOG),histogram of optical flow(HOF),and a three-dimensional feature pyramid.Some researchers are focused on the feature that image depth is insensitive to ambient light since RGB images are oriented to behavior image sequences of objects in terms of two-dimensional information.The depth information of the image is coordinated with the features of RGB image to describe the related behavior.Human behavior recognition-relevant multi-modal method can be used to fuse depth data and skeleton data,which can improve the recognition rate of action effectively.Recent depth map is widely used in relevant to human behavior recognition.But,the collection of depth information data is required to be opti⁃mized because of time complexity of feature extraction and space complexity of feature storage.To resolve the problems,we develop an algorithm to optimize frames of the depth map and resource consumption.At the same time,a new representa⁃tion of motion features is facilitated as well according to the motion information of the centroid.Method First,the temporal feature vector is used in terms of depth map sequence-extracted time sequence information.The centroid motion path relax⁃ation algorithm is used to realize depth image de-duplication and de redundancy,and the skeleton map-extracted spatial structure feature vector from are spliced to form the spatio-temporal feature input.Next,spatial features are extracted in terms of the original skeleton points coordinates-spliced three-channel spatial feature map.Finally,the fusion probability of spatio-temporal features and spatial features is used for classification and recognition.Our centroid motion path relaxation algorithm is focused on the optimization of redundant information,the time complexity of feature extraction,and the space complexity of feature storage.For the skeleton data,the global feature of motion direction is proposed to fully reflect the integrity and coordination of limb movements.The extracted features are concatenated to obtain the spatio-temporal feature vector,and they can be fused and enhanced through the original coordinates of skeleton points-built three-channel spatial feature map.Its effectiveness is verified on the MSR-Action3D dataset.Result The experimental setting 1 demonstrate that it is 0.8260%higher than the depth motion map(DMM)-local binary pattern(LBP)algorithm,1.0152%higher than DMM-CRC(collaborative representation classifier),3.4501%higher than gradient local auto correlation(DMM-GLAC)algorithm,0.6058%higher than EigenJoint algorithm,and 0.6058%higher than space-time auto correlation of gradient(STACOG)algorithm is 10.6245%higher.After removing redundancy,the result of experimental setting 1 is 0.1261%higher as well.The cross-validation on experimental setting 2 show that the average classification and recognition rate in the three subsets is 95.7432%,2.4432%higher than multi-fused method,4.7632%higher than CovP3DJ method,0.3432%higher than D3D-LSTM method,and 0.2132%higher than joint subset selection method.For the overall data set,it is 2.0303%higher than low latency method,0.2403%higher than combination of deep models method,and 2.3403%higher than complex network coding method.The experimental setting 2 illustrates that the average classification recognition rate of cross-validation in three subsets is 95.7432%,and the classification recognition rate of the complete dataset is 93.0403%.Conclusion Our algorithm proposed can improve the recognition effect based on redundancy-optimized,and the featuresextracted have lower correlation mutually,which can improve the accuracy of classification recognition effectively.
作者 施海勇 侯振杰 巢新 钟卓锟 Shi Haiyong;Hou Zhenjie;Chao Xin;Zhong Zhuokun(School of Computer and Artificial Intelligence,Changzhou University,Changzhou 213164,China)
出处 《中国图象图形学报》 CSCD 北大核心 2023年第4期1041-1055,共15页 Journal of Image and Graphics
基金 国家自然科学基金项目(61063021) 江苏省研究生科研创新计划项目(KYCX21_2835)。
关键词 行为识别 质心运动 关键时程信息 时空特征表示 多模态融合 action recognition centroid motion key temporal information spatio-temporal feature representation multi⁃modal fusion
  • 相关文献

参考文献7

二级参考文献145

  • 1张鹏,王润生.静态图像中的感兴趣区域检测技术[J].中国图象图形学报(A辑),2005,10(2):142-148. 被引量:32
  • 2Mokhber A,Achard C,Milgram M. Recognition of Human Behavior by Space-Time Silhouette Characterization[J].Pattern Recognition Let-ters,2008,(01):81-89.
  • 3Polat E,Yeasin M,Sharma R. Robust Tracking of Human Body Parts for Collaborative Human Computer Interaction[J].{H}COMPUTER VISION AND IMAGE UNDERSTANDING,2003,(01):44-69.
  • 4Kjellstr?m H,Romero J,Kragic' D. Visual Object-Action Recogni-tion:Inferring Object Affordances from Human Demonstration[J].{H}COMPUTER VISION AND IMAGE UNDERSTANDING,2011,(01):81-90.
  • 5Suma E A,Krum D M,Lange B. Adapting User Interfaces for Gestural Interaction with the Flexible Action and Articulated Skele-ton Toolkit[J].Computers& Graphics,2012,(03):193-201.
  • 6Ayers D,Shah M. Monitoring Human Behavior from Video Taken in an Office Environment[J].{H}IMAGE AND VISION COMPUTING,2001,(12):833-846.
  • 7López M T,Fernández-Caballero A,Fernández M A. Visual Surveillance by Dynamic Visual Attention Method[J].Pattern Recogni-tion,2006,(11):2194-2211.
  • 8Aggarwal J K,Park S. Human Motion:Modeling and Recognition of Actions and Interactions[A].Thessaloniki,Greece,2004.640-647.
  • 9Moeslund T B,Hilton A,Krüger V. A Survey of Advances in Vision-Based Human Motion Capture and Analysis[J].{H}COMPUTER VISION AND IMAGE UNDERSTANDING,2006,(2/3):90-126.
  • 10Poppe R. A Survey on Vision-Based Human Action Recognition[J].{H}IMAGE AND VISION COMPUTING,2010,(06):976-990.

共引文献153

同被引文献7

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部