期刊文献+

基于多模态信息的视频描述算法 被引量:1

Multimodal information algorithms for video captioning
下载PDF
导出
摘要 为了挖掘视频中不同的模态信息,提出一种基于多模态信息的视频描述算法。在基本的编码解码器网络基础上,更加关注视频多模态信息和高级语义属性。在编码器阶段,提取视频的静态特征、光流特征和视频段特征,同时设计语义属性检测网络得到视频高级语义特征。为了避免解码器阶段的曝光偏差和训练损失与评价准则不统一的问题,采用基于强化学习的训练算法直接将客观评价准则作为优化目标来训练模型。所提出的算法在公开视频描述数据集MSVD上取得了很好的实验效果。 In order to mine different modal information in video,a multimodal information algorithm is proposed for video captioning.Based on the basic encoder-decoder network,we pay more attention to video multimodal information and advanced semantic attributes.In the encoder stage,the static features,optical flow features and video clip features of the video are extracted,and another semantic attribute detection network is designed to obtain high-level semantic features.At the same time,in order to avoid the exposure bias and discrepancy between training loss and evaluation criteria,the training algorithm based on reinforcement learning is used to directly train the model with the objective evaluation criteria as the optimization target.The proposed algorithm has achieved good experimental results on the public dataset MSVD.
作者 孙亮 Sun Liang(School of Information Science and Technology,University of Science and Technology of China,Hefei 230026,China)
出处 《信息技术与网络安全》 2019年第7期47-53,71,共8页 Information Technology and Network Security
关键词 视频描述 多模态信息 语义属性 强化学习 video captioning multimodal information semantic attributes reinforcement learning
  • 相关文献

同被引文献2

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部