摘要
图像描述作为结合图像和文本领域的深度学习任务,拥有广泛的应用场景,现有图像描述模型都是针对大型英文数据集,在小型数据集及其他语言数据集上没有太多关注,本文使用多种神经网络模型实现图像描述,同时验证不同图像描述模型在其他语言上的有效性。实验证明,使用预训练Res Net101后的自适应注意力图像描述模型在多语言Flickr8k数据集上拥有较好效果,Bleu-4值达到20.4,CIDEr值达到54,且模型在汉语及俄语等非英语语言上同样有效。
Image caption is a deep learning task that combines image and text fields. It has a wide range of application scenarios. The existing image caption models are all for large English datasets. There is not much attention on small datasets and other language datasets. This article uses multiple neural network model realizes image caption and verifies the effectiveness of different image caption models in other languages. Experiments have proved that the adaptive attention image caption model after pre-training ResNet101 has good results on the multilingual Flickr8 k dataset. The Bleu-4 value reaches 20. 4 and the CIDEr value reaches 54. It is also valid for non-English languages such as Chinese and Russian.
作者
张大任
艾山·吾买尔
ZHANG Daren;AISHAN Wumaier(College of Information Science and Engineering,Xinjiang University,Urumqi 830046;Xinjiang Laboratory of Multi-Language Information Technology,Xinjiang University,Urumqi 830046)
出处
《现代计算机》
2021年第19期117-123,共7页
Modern Computer