期刊文献+

基于多模态注意机制的全域视频描述生成技术研究

Research of Multimodal Attention-Based Description Generation of Videos in Wide Domain
下载PDF
导出
摘要 基于多模态注意机制的深度神经网络模型,提出了一种针对全域视频的多语言描述自动生成技术。视频描述自动生成模型由端到端的卷积神经网络和双向循环神经网络组成,应用多模态注意机制,显著提升了模型的视频表征能力。通过构建双向循环神经网络编码器,对图像、光流、C3D以及音频等4种多模态视频特征进行融合编码,并引入基于注意机制的解码器,将编码获得的视频序列化特征最终解码为多语言描述序列。模型在开源视频描述数据集上进行了测试实验,实验结果表明了该方法的有效性,其中METEOR值提升了3.31%,为目前已公开的最佳结果。因此,该技术可作为相关领域研究的重要参考。 Based on the deep neural network model of multimodal attention mechanism, this paper proposes an automatic generation technology of multilingual description for global video. The automatic video description generation model is composed of an end-to-end convolutional neural network and a bidirectional cyclic neural network. The multi-modal attention mechanism is applied to significantly improve the video representation ability of the model. By constructing a bidirectional recurrent neural network encoder, four multimodal video features such as image, optical flow, C3d and audio are fused and encoded. And a decoder based on attention mechanism is introduced to decode the encoded video serialization features into a multilingual description sequence. The model has been tested on the open source video description dataset, and the experimental results show the effectiveness of the method, of which the meteor value has increased by 3.31%, which is the best result that has been published so far. Therefore, this technology can be used as an important reference for research in related fields.
作者 杜晓童
出处 《计算机科学与应用》 2022年第10期2225-2232,共8页 Computer Science and Application
  • 相关文献

参考文献5

二级参考文献13

共引文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部