摘要
为了建立基于视频行为识别的长时程图像序列的时空信息模型,文章提出了一种嵌入注意力的时空特征融合网络(attention-embedded spatial-temporal feature fusion network,ASTFFN)的深度神经网络模型。ASTFFN将一个包含动作的长时程图像序列分割成多个重叠的片段,并使用一个注意力嵌入特征提取网络(attention-embedded feature extraction network,AFEN)从每个片段中的RGB图像或光流图像中提取经过注意力加权的时空特征,进而融合每个片段的时空加权特征,生成行为识别的视频级预测。基于UCF101和HMDB51行为识别数据库进行了大量的实验,验证了方法的有效性。与目前主流的行为识别算法相比,该文方法在识别精度上取得了较好的效果。
To model the spatial-temporal information from long-term image sequences for video-based action recognition,a deep neural network named attention-embedded spatial-temporal feature fusion network(ASTFFN)was developed.ASTFFN divided a long-term image sequence containing action into several overlapping snippets and used an attention-embedded feature extraction network(AFEN)to extract attention-weighted spatial or temporal features from the RGB images or optical flow images in each snippet.Next,the weighted spatial and temporal features of each snippet were fused to generate video-level prediction of action recognition.Extensive experiments on two action recognition benchmarks,including the UCF101 database and the HMDB51 database,show the effectiveness of the proposed method.Compared with state-of-the-art action recognition algorithms,the proposed method achieves competitive results for recognition accuracy.
作者
孙宁
郝一嘉
宦睿智
刘佶鑫
韩光
SUN Ning;HAO Yijia;HUAN Ruizhi;LIU Jixin;HAN Guang(Engineering Research Center of Wideband Wireless Communication Technology of Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China)
出处
《合肥工业大学学报(自然科学版)》
CAS
北大核心
2021年第8期1051-1058,1145,共9页
Journal of Hefei University of Technology:Natural Science
基金
国家自然科学基金资助项目(61471206,61871445)
江苏省自然科学基金资助项目(BK61471206,BK61871445)和南京邮电大学科研基金资助项目(NY218066)。
关键词
注意力机制
时空特征融合
动作识别
attention mechanism
spatial-temporal feature fusion
action recognition