期刊文献+

融合语义信息的视频摘要生成 被引量:2

Video summarization by learning semantic information
下载PDF
导出
摘要 任务旨在通过生成简短的视频片段来表示原视频的主要内容,针对现有方法缺乏对语义信息探索的问题,提出了一种融合语义信息的视频摘要生成模型,学习视频特征使其包含丰富的语义信息,进而同时生成描述原始视频内容的视频摘要和文本摘要。该模型分为3个模块:帧级分数加权模块、视觉-语义嵌入模块、视频文本描述生成模块。帧级分数加权模块结合卷积网络与全连接层以获取帧级重要性分数;视觉-语义嵌入模块将视觉特征与文本特征映射到同一空间,以使2种特征相互靠近;视频文本描述生成模块最小化视频摘要的生成描述与文本标注真值之间的距离,以生成带有语义信息的视频摘要。测试时,在获取视频摘要的同时,该模型获得简短的文本摘要作为副产品,可以帮助人们更直观地理解视频内容。在SumMe和TVSum数据集上的实验表明:该模型通过融合语义信息,比现有先进方法取得了更好的性能,在这2个数据集上F-score指标分别提高了0.5%和1.6%。 Video summarization aims to generate short and compact summary to represent original video.However,the existing methods focus more on representativeness and diversity of representation,but less on semantic information.In order to fully exploit semantic information of video content,we propose a novel video summarization model that learns a visual-semantic embedding space,so that the video features contain rich semantic information.It can generate video summaries and text summaries that describe the original video simultaneously.The model is mainly divided into three modules:frame-level score weighting module that combines convolutional layers and fully connected layers;visual-semantic embedding module that embeds the video and text in a common embedding space and make them lose to each other to achieve the purpose of mutual promotion of two features;video caption generation module that generates video summary with semantic information by minimizing the distance between the generated description of the video summary and the manually annotated text of the original video.During the test,while obtaining the video summary,we obtain a short text summary as a by-product,which can help people understand the video content more intuitively.Experiments on SumMe and TVSum datasets show that the proposed model achieves better performance than the existing advanced methods by fusing semantic information,and improves F-score by 0.5%and 1.6%,respectively.
作者 滑蕊 吴心筱 赵文天 HUA Rui;WU Xinxiao;ZHAO Wentian(School of Computer Science&Technology,Beijing Institute of Technology,Beijing 100081,China)
出处 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2021年第3期650-657,共8页 Journal of Beijing University of Aeronautics and Astronautics
基金 国家自然科学基金(61673062,62072041)。
关键词 视频摘要 视觉-语义嵌入空间 视频文本描述 视频关键帧 长短期记忆(LSTM)模型 video summarization visual-semantic embedding space video captioning video key frame Long Short-Term Memory(LSTM)model
  • 相关文献

参考文献3

二级参考文献5

共引文献11

同被引文献15

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部