期刊文献+

多模态特征融合的视频记忆度预测 被引量:1

Video Memorability Prediction Based on Multi-Modal Features Fusion
下载PDF
导出
摘要 随着网络视频的爆炸式增长,视频记忆度成为热点研究方向。视频记忆度是衡量一个视频令人难忘的程度指标,设计自动预测视频记忆度的计算模型有广泛的应用和前景。当前对视频记忆度预测的研究多集中于普遍的视觉特征或语义因素,没有考虑深度特征对视频记忆度的影响。着重探索了视频的深度特征,在视频预处理后利用现有的深度估计模型提取深度图,将视频原始图像和深度图一起输入预训练的ResNet152网络来提取深度特征;使用TF-IDF算法提取视频的语义特征,并对视频记忆度有影响的单词赋予不同的权重;将深度特征、语义特征和从视频内容中提取的C3D时空特征进行后期融合,提出了一个融合多模态的视频记忆度预测模型。在MediaEval 2019会议提供的大型公开数据集(VideoMem)上进行实验,在视频的短期记忆度预测任务中达到了0.545(长期记忆度预测任务:0.240)的Spearman相关性,证明了该模型的有效性。 With the explosive growth of online videos,video memorability has become a research hotspot.Video memora-bility is a metric to describe that how memorable the video is,designing calculation models for automatically predicting video memorability has a wide range of applications and prospects.Most of the current researches on video memorability prediction focused on the common visual features or semantic factors,while didn’t consider the influence of depth fea-tures on video memorability.This paper focuses on exploring the depth features of the video.After the video is prepro-cessed,the depth estimation model is used to extract the depth map.The original video images and the depth maps are input into the pre-trained ResNet152 network to extract the depth features;the TF-IDF algorithm is used to extract seman-tic features of the video,and different weights are assigned to words that have an impact on video memorability;finally,depth features,semantic features,and C3D spatiotemporal features extracted from video content are late fused.A fusion multi-modal video memorability prediction model is proposed.Experiments are conducted on the large public dataset(VideoMem)provided by the MediaEval 2019 conference.The experimenal tresults achieve a Spearman’s rank correla-tion of 0.545(respectively 0.240)for short-term(resp.long-term)memorability prediction,which proves the effective-ness of the model.
作者 常诗颖 胡燕 CHANG Shiying;HU Yan(School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China)
出处 《计算机工程与应用》 CSCD 北大核心 2022年第14期219-226,共8页 Computer Engineering and Applications
基金 湖北省自然科学基金(2019CFC919)。
关键词 视频记忆度 多模态 特征融合 video memorability multi-modal features fusion
  • 相关文献

参考文献2

二级参考文献15

共引文献107

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部