摘要
人体姿态是动作识别的重要语义线索,而CNN能够从图像中提取有很强判别能力的深度特征,本文从图像局部区域提取姿态特征,从整体图像中提取深度特征,探索两者在动作识别中的互补作用.首先介绍了一种姿态表示方法,每个肢体部件的姿态由描述该部件姿态的一组Poselet检测得分表示.为了抑制检测错误,设计了基于部件的模型作为检测上下文.为了从数量有限的数据集中训练CNN网络,本文使用了预训练和精细调节的方法.在两个数据集中的实验表明,本文介绍的姿态特征与深度特征混合使用,动作识别性能得到了极大提升.
Body pose is an important semantic cue for action recognition, and CNN can extract strong discriminative depth feature. This paper extracts pose feature from local image patches and gets depth feature from holistic image, then exploits their complementary relationship in action recognition. A pose representation is introduced, in which pose of a body part is represented by a collection of poselets which describe its pose variability. To suppress detection ambiguity,part-based model is designed as the context of detection for each poselet. CNN is trained through pre-training and fine tuning on the data set with very limited images. Empirical results demonstrate aggressive performance improvement by concatenating pose feature and depth feature.
作者
钱银中
沈一帆
QIAN Yin-Zhong;SHEN Yi-Fan(School of Software,Changzhou College of Information Technology,Changzhou 213164;School of Computer Science,Fudan University,Shanghai 200433;Shanghai Key Laboratory of Intelligent Information Processing,Fudan University,Shanghai 200433)
出处
《自动化学报》
EI
CSCD
北大核心
2019年第3期626-636,共11页
Acta Automatica Sinica
基金
江苏高校品牌专业建设工程资助项目(PPZY2015A090)
常州信息职业技术学院自然科学项目(CXZK201803Z)资助~~