期刊文献+

结合姿态估计和时序分段网络分析的羽毛球视频动作识别 被引量:2

Stroke recognition in badminton videos based on pose estimation and temporal segment networks analysis
原文传递
导出
摘要 目的 为了满足羽毛球教练针对球员单打视频中的动作进行辅助分析,以及用户欣赏每种击球动作的视频集锦等多元化需求,提出一种在提取的羽毛球视频片段中对控球球员动作进行时域定位和分类的方法。方法 在羽毛球视频片段上基于姿态估计方法检测球员执拍手臂,并根据手臂的挥动幅度变化特点定位击球动作时域,根据定位结果生成元视频。将通道—空间注意力机制引入时序分段网络,并通过网络训练实现对羽毛球动作的分类,分类结果包括正手击球、反手击球、头顶击球和挑球4种常见类型,同时基于图像形态学处理方法将头顶击球判别为高远球或杀球。结果 实验结果表明,本文对羽毛球视频片段中动作时域定位的交并比(intersection over union, IoU)值为82.6%,对羽毛球每种动作类别预测的AUC(area under curve)值均在0.98以上,平均召回率与平均查准率分别为91.2%和91.6%,能够有效针对羽毛球视频片段中的击球动作进行定位与分类,较好地实现对羽毛球动作的识别。结论 本文提出的基于羽毛球视频片段的动作识别方法,兼顾了羽毛球动作时域定位和动作分类,使羽毛球动作识别过程更为智能,对体育视频分析提供了重要的应用价值。 Objective Video-based intelligent action recognition has been developing for computer vision analysis nowadays. It is required to recognize action in a specific scene of video due to such multiple video types. To appreciate sports leisure for users like the meta-video set of various badminton stroke, it can assist coaches to analyze stroke better if badminton strokes can be accurately located and recognized in a badminton video. Sports video analysis like the approach of the badminton stroke recognition can be transferred to tennis and table tennis via similar sports features. For a long time span of video based action recognition method, it is necessary to locate the action time domain. Badminton-oriented video can be as this kind of videos to locate stroke time domains. For the time domain localization of video actions, current research is focused on a clear action switching boundary between adjacent actions in a video, and the foreground or background features of adjacent actions are quite different, such as the action video dataset 50 Salads and dataset Breakfast. However, there is no obvious boundary information between foreground and background of adjacent strokes in a badminton video. Therefore, the action recognition based long time span video is not suitable for the localization of badminton strokes. In addition, most existing researches on badminton stroke recognition are based on a static image of a stroke derived from a badminton video, and the stroke recognition of badminton-relevant meta-video is lacking. Our method is focused on an approach for locating and classifying the strokes of ball-control player in an extracted badminton video highlight. Method First, the pose estimation model regional multi-person pose estimation(RMPE) is used to detect human poses in a badminton video highlight. The pose of the targeted player is located via adding prediction scores and position constraints to shield other irrelevant factors of human bones. For the detected pose of targeted player, the node constraints are added to locate arms of the player. The holding arm and the non-holding arm are distinguished according to the difference of the swinging amplitude, and the time domain localization of badminton stroke is carried out by the swinging amplitude variation of the holding arm for extracting the meta-video of badminton stroke. The swing amplitude of the player’s arm in a frame is defined as the linear weighted sum of the square of the upper and lower limbs swing vector modulus. Then, the dataset of badminton meta-videos is applied to train convolutional block attention module-temporal segment networks(CBAM-TSN) for predicting badminton strokes in meta-videos, which add convolutional block attention module in temporal segment networks. It is necessary to extract two-stream of meta-videos from dataset beforehand through training CBAM-TSN because temporal segment network(TSN) inherited the structure of two-stream convolutional neural network(CNN). The two-stream is composed of spatial stream(RGB frames) and temporal stream(optical frames). The predicted stroke from the model of CBAM-TSN contains four familiar types: forehand, backhand, overhead and lob. Finally, we classify the overhead scenario into clear or smash by morphology processing, the clear-oriented meta-videos tend to continuous dynamic mask in the background area at the end of the stroke, but the smash-oriented meta-videos have no continuous dynamic mask information in the background area. Our badminton mask in a meta-video is captured based on the result of images morphological processing. The strokes of clear and smash can be distinguished based on position-relevant features of the badminton mask. Result In a highlighted badminton video, it shows that the segmentation is correct if a meta video segmented by the method of strokes localization and a meta video extracted manually both contain the same badminton stroke. Our indicator of intersection over union(IoU) is used to evaluate the performance of strokes localization. Furthermore, the performance of badminton strokes classification is evaluated via using machine learning based indicator ROC-AUC, recall and precision. The experiment results show that our IoU of stroke localization in badminton video highlights is reached to 82.6%. The indicator AUC about four kinds of badminton strokes(forehand, backhand, overhead and lob) predicted by the model of CBAM-TSN is all over 0.98, the micro-AUC, macro-AUC, average recall and precision is reached to 0.990 8, 0.990 3, 93.5% and 94.3%,respectively. In addition, the CBAM-TSN is compared to the three popular approaches of action recognition in the context of badminton strokes recognition, gets the highest result on precision, micro-AUC and macro-AUC. The final average recall and precision is reached to 91.2% and 91.6% of each. Therefore, it can effectively locate and classify major player’s strokes in a badminton video highlight. Conclusion We facilitate a novel badminton strokes recognizing method in badminton video highlights, which is in combination with badminton stroke localization and badminton stroke classification. The potential sports video analysis is developed further.
作者 陶树 王美丽 Tao Shu;Wang Meli(College of Information Engineering,Northwest A&F University,Yangling 712100,China;Key Laboratory of Agricultural Internet of Things,Ministry of Agriculture and Rural Affairs,Yangling 712100,China;Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service,Yangling 712100,China)
出处 《中国图象图形学报》 CSCD 北大核心 2022年第11期3280-3291,共12页 Journal of Image and Graphics
基金 国家自然科学基金项目(61402374) 农村农业部农业物联网重点实验室项目(2018AIOT-09) 陕西省重点研发计划农业农村领域一般项目(2019NY-167)。
关键词 姿态估计 元视频 羽毛球动作定位 注意力机制—时序分段网络(CBAM-TSN) 形态学处理 羽毛球动作识别 pose estimation meta video badminton stroke localization convolutional block attention module-temporal segment network(CBAM-TSN) morphological processing badminton stroke recognition
  • 相关文献

参考文献4

二级参考文献41

  • 1张振跃,查宏远.Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment[J].Journal of Shanghai University(English Edition),2004,8(4):406-424. 被引量:67
  • 2杨跃东,王莉莉,郝爱民,封春升.基于几何特征的人体运动捕获数据分割方法[J].系统仿真学报,2007,19(10):2229-2234. 被引量:9
  • 3Kovashka A, Grauman K. Learning a hierarchy of discrimi?native space-time neighborhood features for human action recognition/ /Proceedings of the 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA, 2011: 2046-2053.
  • 4Fathi A, Mori G. Action recognition by learning mid-level motion features/ /Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Anchorage, USA, 2008: 1-8.
  • 5Ahmad M, Lee S W. Human action recognition using shape and CLG-motion flow from multi-view image sequences. Pattern Recognition, 2008, 41(7): 2237-2252.
  • 6Hoai M, Lan Zhen-Zhong, De la Torre F. Joint segmentation and classification of human actions in video/ /Proceedings of the 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA, 2011: 3265-3272.
  • 7Lu Guo-Liang, Kodo M, Toyama J. Temporal segmentation and assignment of successive action in a long-term video. Pattern Recognition Letters, 2013, 34(15): 1936-1944.
  • 8Bobick A, Davis J. The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(3): 257-268.
  • 9Weinland D, Ronford R, Boyer E. Free viewpoint action recognition using motion history volumes. Compter Vision and Image Understanding, 2006, 104(2): 249-257.
  • 10Zelnik-Manor L, Irani M. Statistical analysis of dynamic actions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(9): 1530-1535.

共引文献21

同被引文献27

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部