期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
A Video Captioning Method by Semantic Topic-Guided Generation
1
作者 Ou Ye Xinli Wei +2 位作者 Zhenhua Yu Yan Fu Ying Yang 《Computers, Materials & Continua》 SCIE EI 2024年第1期1071-1093,共23页
In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is de... In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits. 展开更多
关键词 video captioning encoder-decoder semantic topic jointly decoding Enhance-TopK sampling
下载PDF
A Sentence Retrieval Generation Network Guided Video Captioning
2
作者 Ou Ye Mimi Wang +3 位作者 Zhenhua Yu Yan Fu Shun Yi Jun Deng 《Computers, Materials & Continua》 SCIE EI 2023年第6期5675-5696,共22页
Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input source.The contents of video captioning are limited since few studies employed external corpus information to guide... Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input source.The contents of video captioning are limited since few studies employed external corpus information to guide the generation of video captioning,which is not conducive to the accurate descrip-tion and understanding of video content.To address this issue,a novel video captioning method guided by a sentence retrieval generation network(ED-SRG)is proposed in this paper.First,a ResNeXt network model,an efficient convolutional network for online video understanding(ECO)model,and a long short-term memory(LSTM)network model are integrated to construct an encoder-decoder,which is utilized to extract the 2D features,3D features,and object features of video data respectively.These features are decoded to generate textual sentences that conform to video content for sentence retrieval.Then,a sentence-transformer network model is employed to retrieve different sentences in an external corpus that are semantically similar to the above textual sentences.The candidate sentences are screened out through similarity measurement.Finally,a novel GPT-2 network model is constructed based on GPT-2 network structure.The model introduces a designed random selector to randomly select predicted words with a high probability in the corpus,which is used to guide and generate textual sentences that are more in line with human natural language expressions.The proposed method in this paper is compared with several existing works by experiments.The results show that the indicators BLEU-4,CIDEr,ROUGE_L,and METEOR are improved by 3.1%,1.3%,0.3%,and 1.5%on a public dataset MSVD and 1.3%,0.5%,0.2%,1.9%on a public dataset MSR-VTT respectively.It can be seen that the proposed method in this paper can generate video captioning with richer semantics than several state-of-the-art approaches. 展开更多
关键词 video captioning encoder-decoder sentence retrieval external corpus RS GPT-2 network model
下载PDF
Video Captioning人工智能技术在电视媒体中的应用
3
作者 梁霄 《卫星电视与宽带多媒体》 2021年第6期90-92,共3页
自二十世纪九十年代以来,我国电视媒体技术飞速发展。伴随着电视节目的种类及数量越来越多,为视频节目添加内容描述的工作日趋繁琐;另一方面,网络及自媒体的快速发展也伴随着媒体资源数量的急剧膨胀,电视节目如何快速,准确地从这些媒体... 自二十世纪九十年代以来,我国电视媒体技术飞速发展。伴随着电视节目的种类及数量越来越多,为视频节目添加内容描述的工作日趋繁琐;另一方面,网络及自媒体的快速发展也伴随着媒体资源数量的急剧膨胀,电视节目如何快速,准确地从这些媒体资源中选出需要的材料也成为当今一大问题。本文探究了在当前人工智能大环境下,Video Captioning技术如何应用于电视节目中,并提出了端到端的系统解决方案,实现了大规模媒体内容的高质量,高效率的文字描述。 展开更多
关键词 video captioning 电视节目 深度学习 人工智能
下载PDF
Trends in Event Understanding and Caption Generation/Reconstruction in Dense Video:A Review
4
作者 Ekanayake Mudiyanselage Chulabhaya Lankanatha Ekanayake Abubakar Sulaiman Gezawa Yunqi Lei 《Computers, Materials & Continua》 SCIE EI 2024年第3期2941-2965,共25页
Video description generates natural language sentences that describe the subject,verb,and objects of the targeted Video.The video description has been used to help visually impaired people to understand the content.It... Video description generates natural language sentences that describe the subject,verb,and objects of the targeted Video.The video description has been used to help visually impaired people to understand the content.It is also playing an essential role in devolving human-robot interaction.The dense video description is more difficult when compared with simple Video captioning because of the object’s interactions and event overlapping.Deep learning is changing the shape of computer vision(CV)technologies and natural language processing(NLP).There are hundreds of deep learning models,datasets,and evaluations that can improve the gaps in current research.This article filled this gap by evaluating some state-of-the-art approaches,especially focusing on deep learning and machine learning for video caption in a dense environment.In this article,some classic techniques concerning the existing machine learning were reviewed.And provides deep learning models,a detail of benchmark datasets with their respective domains.This paper reviews various evaluation metrics,including Bilingual EvaluationUnderstudy(BLEU),Metric for Evaluation of Translation with Explicit Ordering(METEOR),WordMover’s Distance(WMD),and Recall-Oriented Understudy for Gisting Evaluation(ROUGE)with their pros and cons.Finally,this article listed some future directions and proposed work for context enhancement using key scene extraction with object detection in a particular frame.Especially,how to improve the context of video description by analyzing key frames detection through morphological image analysis.Additionally,the paper discusses a novel approach involving sentence reconstruction and context improvement through key frame object detection,which incorporates the fusion of large languagemodels for refining results.The ultimate results arise fromenhancing the generated text of the proposedmodel by improving the predicted text and isolating objects using various keyframes.These keyframes identify dense events occurring in the video sequence. 展开更多
关键词 video description video to text video caption sentence reconstruction
下载PDF
Tentative model of integrating authentic captioned video to facilitate ESL learning
5
作者 WANG Yan-dong SHEN Cai-fen 《Sino-US English Teaching》 2007年第9期1-13,共13页
Guided by relevant learning theories and based on the analysis of the attributes of captioned video, the Chinese ESL learners' characteristics and the characteristics of ESL learning itself, the paper tries to propos... Guided by relevant learning theories and based on the analysis of the attributes of captioned video, the Chinese ESL learners' characteristics and the characteristics of ESL learning itself, the paper tries to propose a model of integrating captioned video in ESL acquisition process. The model focuses on the feasibility of using captioned video to facilitate or support ESL learning. 展开更多
关键词 authentic captioned video ESL information processing theory dual coding theory
下载PDF
Captioning Videos Using Large-Scale Image Corpus 被引量:1
6
作者 Xiao-Yu Du Yang Yang +3 位作者 Liu Yang Fu-Min Shen Zhi-Guang Qin Jin-Hui Tang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第3期480-493,共14页
Video captioning is the task of assigning complex high-level semantic descriptions (e.g., sentences or paragraphs) to video data. Different from previous video analysis techniques such as video annotation, video eve... Video captioning is the task of assigning complex high-level semantic descriptions (e.g., sentences or paragraphs) to video data. Different from previous video analysis techniques such as video annotation, video event detection and action recognition, video captioning is much closer to human cognition with smaller semantic gap. However, the scarcity of captioned video data severely limits the development of video captioning. In this paper, we propose a novel video captioning approach to describe videos by leveraging freely-available image corpus with abundant literal knowledge. There are two key aspects of our approach: 1) effective integration strategy bridging videos and images, and 2) high efficiency in handling ever-increasing training data. To achieve these goals, we adopt sophisticated visual hashing techniques to efficiently index and search large-scale images for relevant captions, which is of high extensibility to evolving data and the corresponding semantics. Extensive experimental results on various real-world visual datasets show the effectiveness of our approach with different hashing techniques, e.g., LSH (locality-sensitive hashing), PCA-ITQ (principle component analysis iterative quantization) and supervised discrete hashing, as compared with the state-of-the-art methods. It is worth noting that the empirical computational cost of our approach is much lower than that of an existing method, i.e., it takes 1/256 of the memory requirement and 1/64 of the time cost of the method of Devlin et al. 展开更多
关键词 video captioning HASHING image captioning
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部