摘要
是对视频内容的高度概括,选择出具有多样性和重要性的视频帧子集。文章从关键帧代表性不够全面的角度出发,提出一种利用多路特征来提取视频关键帧的方法,通过卷积神经网络(convolutional neural network,CNN)和长短时记忆网络(long short-term memory,LSTM)来预测视频帧被选中的概率。将提取出的视频帧的原始特征送入LSTM,将处理过的两两视频帧特征的差特征也做同样的处理,差特征包含了相邻视频帧之间更多不同的信息。由于LSTM长期依赖的特性,使得整个网络可以学习到视频上下文之间更多的信息,通过对处理过的两路特征做得分融合,作为判断视频帧被选择与否的最终得分。文中的强化学习机制对视频摘要有优化的作用,实验在两个基准数据集SuMme和TVSum上进行。结果表明,该方法能够显著提高调和平均数(F-score)指标。
The video summary is a highly concise video content that extracts a subset of video frames of diversity and importance.From the perspective of improving the quality of video summary,this paper proposes a method to extract video key frames by using multi-channel features,which is through convolutional neural network( CNN) and long-term memory network( LSTM) to predict the probability that a video frame will be selected. The method is to send the original symbols of the extracted video frames to the LSTM,and on the other hand,the difference features of the processed two-two video frame features are also treated the same,and the difference features include more different between adjacent video frames. Information,and because of the long-term dependence of LSTM,allows the entire network to learn more information between video contexts,by making a score fusion of the processed two-way features as the final score of whether the video frame is selected or not. The intensive learning mechanism in this paper has an optimized effect on video. We conducted experiments on two benchmark datasets,SuMme and TVSum. The results show that this method can significantly improve the F-score index.
作者
李巧凤
赵烨
LI Qiaofeng;ZHAO Ye(School of Computer and Information,Hefei University of Technology,Hefei 230009,China)
出处
《智能计算机与应用》
2020年第10期1-5,共5页
Intelligent Computer and Applications
基金
国家自然科学基金(61876056,61502138)。