摘要
针对现今网络不能充分融合视频的时空信息,提出一种基于注意力的双流CNN与DU-DLSTM的识别模型。采用Opencv提取视频帧和相应的光流特征图,空间流网络解码相应的光流特征图得到空间注意力增强向量,解码图像序列得到原始图像时间维的特征向量,作为时间流网络的输入。将两个网络的输出特征加权融合后输入DU-DLSTM(单双向结构的长短时记忆网络)模块,利用Softmax最大似然函数完成行为识别任务。提出方法具有很好的鲁棒性,在KTH数据集上达到98.9%的识别精度。
To solve the problem that the space-time information of video cannot be fully integrated in today’s network,a recognition model of two-stream CNN networks based on attention and DU-DLSTM was proposed.The video frame and the correspon-ding optical flow feature map were extracted using Opencv.The spatial attention enhancement vector was obtained by decoding the corresponding optical flow feature map through the spatial flow network,and the feature vector of the original image time dimension was obtained by decoding the image sequence,which was used as the input of the time flow network.The output features of the two networks were weighted and merged into DU-DLSTM module.The maximum likelihood function of Softmax was used to complete the task of action recognition.The proposed method has good robustness and achieves a high recognition accuracy of 98.9%on KTH datasets.
作者
马翠红
王毅
毛志强
MA Cui-hong;WANG Yi;MAO Zhi-qiang(College of Electrical Engineering,North China University of Science and Technology,Tangshan 063210,China)
出处
《计算机工程与设计》
北大核心
2020年第10期2903-2906,共4页
Computer Engineering and Design
基金
国家自然科学基金项目(61171058)。