期刊文献+

基于多模态特征融合的动态视频摘要算法

Dynamic Video Summarization Algorithm Based on Multi-modal Feature Fusion
下载PDF
导出
摘要 技术可以从冗长的原始视频中提取出关键帧或关键镜头,生成简明紧凑的视频摘要,在基本概括了视频主要内容的基础上极大地缩短用户浏览时间。针对目前视频摘要算法普遍忽略视频中的运动信息而导致摘要缺乏逻辑性和故事性的问题,提出了一种基于多模态特征融合的动态视频摘要算法(MFFSN),采用了有监督的编码器-解码器的网络框架。在编码端通过深度神经网络提取原始视频帧的多尺度空间特征和光流图像的多尺度运动特征,利用运动引导注意力模块(Motion Guided Attention,MGA)进行时空注意力建模,对空间特征和运动特征进行有机融合得到多模态特征;在解码阶段,采用自注意力机制关注数据中的显著特征,再通过回归网络得到帧重要性分数;最后根据背包算法选择关键镜头生成动态摘要。在Sum Me基准数据集上的实验结果证明提出的MFFSN摘要算法优于现有的同类视频摘要算法。 Video summarization technology can extract key frames or key shots from the long original video to generate a concise and compact video summary, which can greatly shorten the browsing time of users on the basis of summarizing the main content of the video. The current video summarization algorithms generally ignore the motion information in the video, which leads to the lack of logic and story in the summary.In order to solve this problem, a dynamic video summarization algorithm based on multi-modal feature fusion(MFFSN) is proposed in this paper.MFFSN adopts a supervised encoder-decoder framework.At the coding end, the multi-scale spatial features of the original video frame and the multi-scale motion features of the optical flow image are extracted by deep neural network. The motion guided attention(MGA)module is used to model the spatio-temporal attention, and the spatial features and the motion features are organically integrated to obtain the multi-modal features.At the decoding end, the self-attention mechanism is used to pay attention to the salient features in the data, and then the frame importance score is obtained by regression network. Finally, the key shots are selected to generate dynamic summaryaccording to the knapsack algorithm.
作者 乾竞元 高伟 滕国伟 Qian Jingyuan
出处 《工业控制计算机》 2022年第10期81-84,共4页 Industrial Control Computer
关键词 视频摘要 多模态特征融合 光流 注意力机制 video summarization multi-modal feature fusion optical flow attentional mechanism
  • 相关文献

参考文献1

二级参考文献51

  • 1Maybury M T. Broadcast news understanding and navigation [ C ]//Proceedings of the Fifteenth Conference on Innovative Ap- plications of Artificial Intelligence. Trier, German: DBLP,2003 : 117-122.
  • 2Pfeiffer S, Lienhart R, Ktthne G, et al. The MoCA project. [ M ]//Informatik'98. Berlin, Heidelberg: Springer, 1998 : 329- 338.
  • 3Chang S F, Chen W, Meng H J, et al. VideoQ: an automated content based video search system using visual cues [ C ]//Pro- ceedings of the 5th ACM International Conference on Multimedia. New York, USA:ACM, 1997: 313-324.
  • 4Snoek C G M, Worring M. Time interval maximum entropy based event indexing in soccer [ C ]//Proceedings of IEEE Internation- al Conference on Multimedia and Expo. Washington DC, USA: IEEE, 2003:481-484.
  • 5Uchihashi S, Foote J, Girgensohn A, et al. Video manga: gener- ating semantieally meaningful video summaries [ C ]//Proceedings of the seventh ACM International Conference on Multimedia ( Part 1). New York, USA:ACM, 1999: 383-392.
  • 6Zhuang Y, Rui Y, Huang T S, et al. Adaptive key frame extrac- tion using unsupervised clustering [ C ]// Proceedings of Interna- tional Conference on Image Processing. Washington DC, USA: IEEE, 1998, 1:866-870. [DOI:10. 1109/ICIP. 1998.723655].
  • 7Almeida J, Torres R D S, Leite N J. Rapid video summarization on compressed video [ C ]// IEEE International Symposium on Multimedia. Washington DC, USA: IEEE, 2010: 113-120. [ DOI : 10. 1109/ISM. 2010. 25 ].
  • 8Coldefy F, Bouthemy P. Unsupervised soccer video abstraction based on pitch, dominant color and camera motion analysis [ C ]//Proceedings of the 12th Annual ACM International Confer- ence on Multimedia. New York, USA : ACM, 2004 : 268-271.
  • 9Wolf W. Key frame selection by motion analysis [ C ]//Proceed- ings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington DC, USA : IEEE, 1996, 2 : 1228- 1231. [DOI: 10. 1109/ICASSP. 1996. 543588 ].
  • 10Chan W S, Au O C, Chong T S. Key frame selection by macrob- lock type and motion vector analysis [ C ]//Proceedings of Inter- national Conference on Multimedia and Expo. Washington DC, USA: IEEE, 2004, 1: 575-578. [DOI: 10.1109/ICME. 2004. 1394257 ].

共引文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部