期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment
1
作者 Emran Al-Buraihy Dan Wang 《Computers, Materials & Continua》 SCIE EI 2024年第6期3913-3938,共26页
Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural net... Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources. 展开更多
关键词 Cross-language image description multimodal deep learning semantic matching reward mechanisms
下载PDF
Enhancing Image Description Generation through Deep Reinforcement Learning:Fusing Multiple Visual Features and Reward Mechanisms
2
作者 Yan Li Qiyuan Wang Kaidi Jia 《Computers, Materials & Continua》 SCIE EI 2024年第2期2469-2489,共21页
Image description task is the intersection of computer vision and natural language processing,and it has important prospects,including helping computers understand images and obtaining information for the visually imp... Image description task is the intersection of computer vision and natural language processing,and it has important prospects,including helping computers understand images and obtaining information for the visually impaired.This study presents an innovative approach employing deep reinforcement learning to enhance the accuracy of natural language descriptions of images.Our method focuses on refining the reward function in deep reinforcement learning,facilitating the generation of precise descriptions by aligning visual and textual features more closely.Our approach comprises three key architectures.Firstly,it utilizes Residual Network 101(ResNet-101)and Faster Region-based Convolutional Neural Network(Faster R-CNN)to extract average and local image features,respectively,followed by the implementation of a dual attention mechanism for intricate feature fusion.Secondly,the Transformer model is engaged to derive contextual semantic features from textual data.Finally,the generation of descriptive text is executed through a two-layer long short-term memory network(LSTM),directed by the value and reward functions.Compared with the image description method that relies on deep learning,the score of Bilingual Evaluation Understudy(BLEU-1)is 0.762,which is 1.6%higher,and the score of BLEU-4 is 0.299.Consensus-based Image Description Evaluation(CIDEr)scored 0.998,Recall-Oriented Understudy for Gisting Evaluation(ROUGE)scored 0.552,the latter improved by 0.36%.These results not only attest to the viability of our approach but also highlight its superiority in the realm of image description.Future research can explore the integration of our method with other artificial intelligence(AI)domains,such as emotional AI,to create more nuanced and context-aware systems. 展开更多
关键词 image description deep reinforcement learning attention mechanism
下载PDF
Feedback LSTM Network Based on Attention for Image Description Generator 被引量:2
3
作者 Zhaowei Qu Bingyu Cao +3 位作者 Xiaoru Wang Fu Li Peirong Xu Luhan Zhang 《Computers, Materials & Continua》 SCIE EI 2019年第5期575-589,共15页
Images are complex multimedia data which contain rich semantic information.Most of current image description generator algorithms only generate plain description,with the lack of distinction between primary and second... Images are complex multimedia data which contain rich semantic information.Most of current image description generator algorithms only generate plain description,with the lack of distinction between primary and secondary object,leading to insufficient high-level semantic and accuracy under public evaluation criteria.The major issue is the lack of effective network on high-level semantic sentences generation,which contains detailed description for motion and state of the principal object.To address the issue,this paper proposes the Attention-based Feedback Long Short-Term Memory Network(AFLN).Based on existing codec framework,there are two independent sub tasks in our method:attention-based feedback LSTM network during decoding and the Convolutional Block Attention Module(CBAM)in the coding phase.First,we propose an attentionbased network to feedback the features corresponding to the generated word from the previous LSTM decoding unit.We implement feedback guidance through the related field mapping algorithm,which quantifies the correlation between previous word and latter word,so that the main object can be tracked with highlighted detailed description.Second,we exploit the attention idea and apply a lightweight and general module called CBAM after the last layer of VGG 16 pretraining network,which can enhance the expression of image coding features by combining channel and spatial dimension attention maps with negligible overheads.Extensive experiments on COCO dataset validate the superiority of our network over the state-of-the-art algorithms.Both scores and actual effects are proved.The BLEU 4 score increases from 0.291 to 0.301 while the CIDEr score rising from 0.912 to 0.952. 展开更多
关键词 image description generator feedback LSTM network ATTENTION CBAM
下载PDF
LOW-COMPLEXITY REDUNDANCY INSERTION IN MULTIPLE DESCRIPTION CODING
4
作者 Yang Rener Li Jinxiang Wen Huafeng 《Journal of Electronics(China)》 2008年第3期409-414,共6页
Multiple description coding has recently been proposed as a joint source and channel coding to solve the problem of robust image transmission over unreliable network, and it can offer a variety of tradeoff between sig... Multiple description coding has recently been proposed as a joint source and channel coding to solve the problem of robust image transmission over unreliable network, and it can offer a variety of tradeoff between signal redundancy and transmission robustness. In this letter, a novel pre- and post-processing method with flexible redundancy insertion for polyphase downsampling multiple de- scription coding is presented. The proposed method can be implemented as pre- and post-processing to all standards for image and video communications, with obvious advantages. Simulation results show that this approach can reduce the computational complexity while provide a flexible redundancy in- sertion to make the system robust for any packet loss situation over different networks. 展开更多
关键词 Multiple description image coding Polyphase downsampling Redundancy insertion
下载PDF
LSTM-in-LSTM for generating long descriptions of images
5
作者 Jun Song Siliang Tang +2 位作者 Jun Xiao Fei Wu Zhongfei(Mark)Zhang 《Computational Visual Media》 2016年第4期379-388,共10页
In this paper, we propose an approach for generating rich fine-grained textual descriptions of images. In particular, we use an LSTM-in-LSTM(long short-term memory) architecture, which consists of an inner LSTM and an... In this paper, we propose an approach for generating rich fine-grained textual descriptions of images. In particular, we use an LSTM-in-LSTM(long short-term memory) architecture, which consists of an inner LSTM and an outer LSTM. The inner LSTM effectively encodes the long-range implicit contextual interaction between visual cues(i.e., the spatiallyconcurrent visual objects), while the outer LSTM generally captures the explicit multi-modal relationship between sentences and images(i.e., the correspondence of sentences and images). This architecture is capable of producing a long description by predicting one word at every time step conditioned on the previously generated word, a hidden vector(via the outer LSTM),and a context vector of fine-grained visual cues(via the inner LSTM). Our model outperforms state-of-theart methods on several benchmark datasets(Flickr8k,Flickr30 k, MSCOCO) when used to generate long rich fine-grained descriptions of given images in terms of four different metrics(BLEU, CIDEr, ROUGE-L, and METEOR). 展开更多
关键词 long short-term memory(LSTM) image description generation computer vision neural network
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部