基于多模态特征融合的动态视频摘要算法

Dynamic Video Summarization Algorithm Based on Multi-modal Feature Fusion

下载PDF

导出

摘要技术可以从冗长的原始视频中提取出关键帧或关键镜头,生成简明紧凑的视频摘要,在基本概括了视频主要内容的基础上极大地缩短用户浏览时间。针对目前视频摘要算法普遍忽略视频中的运动信息而导致摘要缺乏逻辑性和故事性的问题,提出了一种基于多模态特征融合的动态视频摘要算法(MFFSN),采用了有监督的编码器-解码器的网络框架。在编码端通过深度神经网络提取原始视频帧的多尺度空间特征和光流图像的多尺度运动特征,利用运动引导注意力模块(Motion Guided Attention,MGA)进行时空注意力建模,对空间特征和运动特征进行有机融合得到多模态特征;在解码阶段,采用自注意力机制关注数据中的显著特征,再通过回归网络得到帧重要性分数;最后根据背包算法选择关键镜头生成动态摘要。在Sum Me基准数据集上的实验结果证明提出的MFFSN摘要算法优于现有的同类视频摘要算法。 Video summarization technology can extract key frames or key shots from the long original video to generate a concise and compact video summary, which can greatly shorten the browsing time of users on the basis of summarizing the main content of the video. The current video summarization algorithms generally ignore the motion information in the video, which leads to the lack of logic and story in the summary.In order to solve this problem, a dynamic video summarization algorithm based on multi-modal feature fusion(MFFSN) is proposed in this paper.MFFSN adopts a supervised encoder-decoder framework.At the coding end, the multi-scale spatial features of the original video frame and the multi-scale motion features of the optical flow image are extracted by deep neural network. The motion guided attention(MGA)module is used to model the spatio-temporal attention, and the spatial features and the motion features are organically integrated to obtain the multi-modal features.At the decoding end, the self-attention mechanism is used to pay attention to the salient features in the data, and then the frame importance score is obtained by regression network. Finally, the key shots are selected to generate dynamic summaryaccording to the knapsack algorithm.

作者乾竞元高伟滕国伟 Qian Jingyuan

机构地区上海大学通信与信息工程学院上海文广科技(集团)有限公司

出处《工业控制计算机》 2022年第10期81-84,共4页 Industrial Control Computer

关键词视频摘要多模态特征融合光流注意力机制 video summarization multi-modal feature fusion optical flow attentional mechanism

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1王娟,蒋兴浩,孙锬锋.视频摘要技术综述[J].中国图象图形学报,2014,19(12):1685-1695. 被引量：33

二级参考文献51

1Maybury M T. Broadcast news understanding and navigation [ C ]//Proceedings of the Fifteenth Conference on Innovative Ap- plications of Artificial Intelligence. Trier, German: DBLP,2003 : 117-122.
2Pfeiffer S, Lienhart R, Ktthne G, et al. The MoCA project. [ M ]//Informatik'98. Berlin, Heidelberg: Springer, 1998 : 329- 338.
3Chang S F, Chen W, Meng H J, et al. VideoQ: an automated content based video search system using visual cues [ C ]//Pro- ceedings of the 5th ACM International Conference on Multimedia. New York, USA:ACM, 1997: 313-324.
4Snoek C G M, Worring M. Time interval maximum entropy based event indexing in soccer [ C ]//Proceedings of IEEE Internation- al Conference on Multimedia and Expo. Washington DC, USA: IEEE, 2003:481-484.
5Uchihashi S, Foote J, Girgensohn A, et al. Video manga: gener- ating semantieally meaningful video summaries [ C ]//Proceedings of the seventh ACM International Conference on Multimedia ( Part 1). New York, USA:ACM, 1999: 383-392.
6Zhuang Y, Rui Y, Huang T S, et al. Adaptive key frame extrac- tion using unsupervised clustering [ C ]// Proceedings of Interna- tional Conference on Image Processing. Washington DC, USA: IEEE, 1998, 1:866-870. [DOI:10. 1109/ICIP. 1998.723655].
7Almeida J, Torres R D S, Leite N J. Rapid video summarization on compressed video [ C ]// IEEE International Symposium on Multimedia. Washington DC, USA: IEEE, 2010: 113-120. [ DOI : 10. 1109/ISM. 2010. 25 ].
8Coldefy F, Bouthemy P. Unsupervised soccer video abstraction based on pitch, dominant color and camera motion analysis [ C ]//Proceedings of the 12th Annual ACM International Confer- ence on Multimedia. New York, USA : ACM, 2004 : 268-271.
9Wolf W. Key frame selection by motion analysis [ C ]//Proceed- ings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington DC, USA : IEEE, 1996, 2 : 1228- 1231. [DOI: 10. 1109/ICASSP. 1996. 543588 ].
10Chan W S, Au O C, Chong T S. Key frame selection by macrob- lock type and motion vector analysis [ C ]//Proceedings of Inter- national Conference on Multimedia and Expo. Washington DC, USA: IEEE, 2004, 1: 575-578. [DOI: 10.1109/ICME. 2004. 1394257 ].

共引文献32

1杨霜雪,刘晓丹.视频摘要技术的专利现状分析[J].中国发明与专利,2016,0(12):30-34.
2马元元,李向伟,刘艳飞.海量监控视频分级摘要生成系统研究[J].现代电子技术,2017,40(13):34-37. 被引量：5
3惠开发,成科扬,詹永照.基于改进ViBe算法的视频浓缩[J].山东大学学报（工学版）,2017,47(3):43-48. 被引量：1
4许彬,张海涛,胡豆豆.云计算平台中监控视频摘要任务调度方法研究[J].计算机应用与软件,2017,34(7):7-10. 被引量：6
5张亚洲,余正生.基于k-means++聚类的视频摘要生成算法[J].工业控制计算机,2017,30(7):129-130. 被引量：4
6叶锋,廖茜,汪敏,林贵增,陈超意,林晖.基于视频分析和多传感器融合的移动式监控系统[J].计算机系统应用,2017,26(8):88-93.
7张园,朱康,林荣生.汽车倒车影像抗干扰电路设计[J].自动化与仪器仪表,2017(10):58-59. 被引量：4
8冀中,樊帅飞.利用超图随机游走的视频摘要生成方法[J].小型微型计算机系统,2017,38(11):2535-2540. 被引量：2
9石亚玲,刘正熙,熊运余,李征.基于弱特征重识别的多目标长效摘要[J].计算机技术与发展,2018,28(5):27-31.
10徐艺琳,刘军,王琪.视频联合思维导图在行动静脉内瘘术患者健康教育中的应用[J].中西医结合护理（中英文）,2018,4(4):145-147. 被引量：18

1张红伟,吕俊丽.自我调节模式联合运动引导想象训练在脑卒中恢复期住院患者中的应用效果[J].国际护理学杂志,2022,41(14):2559-2564. 被引量：1
2周冬华,杨永和.英语新闻标题的多模态特征研究[J].英语广场（学术研究）,2022(18):27-30.
3徐金阳,陈斌,仇苇.基于长短时间记忆网络与集成学习的多通道脑电情感识别[J].计算机科学与应用,2022,12(10):2237-2248.
4湛浩宇,周吾珍,邱凌瀚,韩飞,郑晓莉.基于多特征融合的遥感影像土地利用分类[J].测绘与空间地理信息,2022,45(10):50-53.
5王昊,刘渊晨,赵萌,裘靖文.基于多模态特征的音乐情感多任务识别研究[J].现代情报,2022,42(11):61-75. 被引量：3
6倪连红,李丽兰,缪羽,刘慧茹.基于眼直肌运动引导灯的眼科护理器设计[J].自动化与仪器仪表,2022(9):270-273. 被引量：1
7郭文强,赵艳,徐紫薇,肖秦琨.基于多模态的贝叶斯网络疼痛识别方法[J].科学技术与工程,2022,22(28):12505-12511. 被引量：2
8李社蕾,周波,杨博雄.图马尔可夫卷积神经网络半监督文本分类研究[J].计算机仿真,2022,39(9):288-292. 被引量：2
9李佩,陈乔松,陈鹏昌,邓欣,王进,朴昌浩.基于模态特异及模态共享特征信息的多模态细粒度检索[J].计算机工程,2022,48(11):62-68. 被引量：4
10彭亚新,赵倩.基于流形假设的骨架序列动作识别算法[J].上海大学学报（自然科学版）,2022,28(2):179-200.

工业控制计算机

2022年第10期

浏览历史

内容加载中请稍等...

基于多模态特征融合的动态视频摘要算法

参考文献1

二级参考文献51

共引文献32

相关作者

相关机构

相关主题

浏览历史