摘要
面对海量的视频数据,视频摘要技术在视频检索、视频浏览等领域发挥着越来越重要的作用,其旨在通过生成简短的视频片段或选择关键帧集合来获取输入视频中的重要信息。现有的方法大多集中在研究视频摘要的代表性和多样性上,没有考虑到视频结构等多尺度上下文信息。针对上述问题,提出了一种基于全卷积序列网络的视频摘要模型,模型中利用时间金字塔池化对视频中的多尺度上下文信息进行提取,并利用全连接的条件随机场对视频帧序列进行标注。在SumMe和TVSum数据集上的实验表明,所提模型取得了比全卷积序列网络更好的性能,在这两个数据集上F分指标分别提高了1.6%和3.0%。
In the face of massive video data,video summarization technique plays an increasingly important role in video retrieval,video browsing and other fields.It aims to obtain important information in input videos by generating short video clips or selecting a set of key frames.Most of the existing methods focus on the representativeness and diversity of video summarization,without considering the multi-scale contextual information such as the structure of the video.To solve the above problems,a video summarization model based on improved fully convolutional network is proposed,in which time pyramid pooling is used to extract multi-scale contextual information,and the fully connected conditional random field is used to label the video frame sequence.Experiments on SumMe and TVSum datasets show that the proposed model achieves better performance than fully convolutional sequence networks,and the F-score indexes on these two data sets are improved by 1.6%and 3.0%,respectively.
作者
王浩
彭力
Wang Hao;Peng Li(School of Internet of Things Engineering,Jiangnan University,Wuari,Jiangsu 214122,China)
出处
《激光与光电子学进展》
CSCD
北大核心
2021年第22期407-415,共9页
Laser & Optoelectronics Progress
基金
国家自然科学基金(61873112)。
关键词
机器视觉
视频摘要
深度学习
全卷积序列网络
卷积神经网络
machine vision
video summarization
deep learning
fully convolutional sequence networks
convolutional neural networks