期刊文献+

从视频到语言:视频标题生成与描述研究综述 被引量:11

From Video to Language:Survey of Video Captioning and Description
下载PDF
导出
摘要 视频标题生成与描述是使用自然语言对视频进行总结与重新表达.由于视频与语言之间存在异构特性,其数据处理过程较为复杂.本文主要对基于“编码−解码”架构的模型做了详细阐述,以视频特征编码与使用方式为依据,将其分为基于视觉特征均值/最大值的方法、基于视频序列记忆建模的方法、基于三维卷积特征的方法及混合方法,并对各类模型进行了归纳与总结.最后,对当前存在的问题及可能趋势进行了总结与展望,指出需要生成融合情感、逻辑等信息的结构化语段,并在模型优化、数据集构建、评价指标等方面进行更为深入的研究. The task of video captioning and description is to summarize and re-express the visual content of video with natural language/text.It is challenging because it involves the transformation of different modal information,and there exists heterogeneity between the visual data and language.In this work,the models based on the“encoder-decoder”pipeline are mainly elaborated in detail.According to the encoding and usage of visual features,the current models are classified into four types:the models based on mean/max pooling feature,the models based on video sequential memory,the models based on 3D CNN feature,and the models based on hybrid features.A number of popular works of each type are described and analyzed.Finally,the existing problems and possible trends worth studying are summarized.It is pointed out that the prior knowledge including emotion and logical semantics in complex videos should be further mined and embedded for the generation of logical paragraph description.Moreover,it is still desired to further investigate the techniques of model optimization,dataset construction and evaluation metrics for video captioning and description.
作者 汤鹏杰 王瀚漓 TANG Peng-Jie;WANG Han-Li(College of Electronics and Information Engineering,Jing-gangshan University,Ji'an 343009;Department of Com-puter Science and Technology,Tongji University,Shanghai 201804;Key Laboratory of Embedded System and Service Computing(Ministry of Education),Tongji University,Shang-hai 200092;Shanghai Institute of Intelligent Science and Technology,Tongji University,Shanghai 200092)
出处 《自动化学报》 EI CAS CSCD 北大核心 2022年第2期375-397,共23页 Acta Automatica Sinica
基金 国家自然科学基金(62062041,61976159,61962003) 上海市科技创新行动计划项目(20511100700) 江西省自然科学基金(20202BAB202017,20202BABL202007) 井冈山大学博士启动基金(JZB1923)资助。
关键词 视频描述 卷积神经网络 循环神经网络 语段生成 情感表达 逻辑语义 Video description convolutional neural network recurrent neural network paragraph generation emotion expression logical semantics
  • 相关文献

参考文献4

二级参考文献13

共引文献56

同被引文献28

引证文献11

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部