摘要
视频信息检索与其他多媒体检索的最大不同在于视频信息量较大,因此进行视频间相似度计算时的计算量较大。此外,对视频特征的提取中常常忽略视频帧之间的时间相关性,从而导致特征提取不充分,影响视频检索的精度。为此,文中提出基于三维卷积和哈希方法的视频检索方法。该方法构建了一个端到端的框架,使用三维卷积神经网络来提取视频中代表帧的特征,并将视频特征映射到低维的汉明空间中去,在汉明空间计算相似度。在两个视频数据集下的实验结果表明,相较于当前最新的视频检索算法,文中所提方法在精度上有较大的提升。
Different from other multimedia information retrieval,video retrieval requires a large amount of computation in similarity calculation due to the large amount of information contained in videos.In addition,the temporal correlation between video frames is often ignored in feature extraction,which leads to insufficient feature extraction and affects the accuracy of video retrieval.For this problem,this study proposes a video retrieval method based on 3D convolution and Hash method.This method constructs an end-to-end framework,uses a 3D convolutional neural network to extract the features of the representative frames selected from the video,and then maps the features to the low-dimensional Hamming space to calculate the similarity in the Hamming space.Experimental results on two video data sets show that compared with the latest video retrieval algorithms,the proposed method has a greater improvement in accuracy.
作者
陈汗青
李菲菲
陈虬
CHEN Hanqing;LI Feifei;CHEN Qiu(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 20093,China)
出处
《电子科技》
2022年第4期35-39,66,共6页
Electronic Science and Technology
基金
上海市高校特聘教授(东方学者)岗位计划(ES2015XX)。
关键词
视频检索
三维卷积
特征表示
哈希方法
监督学习
特征降维
汉明空间
相似度匹配
video retrieval
3D convolution
feature representation
Hash method
supervised learning
feature reduction
Hamming space
similarity matching