摘要
针对现有图像标题生成模型在非英语语言中质量不高且仅能实现单一语言图像标题生成的问题,提出基于CNN与Transformer混合结构的多语言图像标题生成模型.首先利用CNN提取图像特征作为Transformer模型的编码端输入,然后解码端的输入为添加语言标签、进行分词与拉丁化处理后的6种语言,训练时将不同语言的损失和作为优化目标,最终实现不同语言间的联合训练.以Flickr8K数据集为基础,拓展了包含6种语言的多语言图像标题生成数据集,并在该数据集上进行了验证.结果表明:该模型可以同时生成多种语言的图像标题,且生成质量比相同规模的单语言模型质量高,并验证了该方法的有效性.
Aiming at the problem that the existing image title generation model is of low quality in non-English languages and can only achieve single language image title generation, a multilingual image title generation model based on the hybrid structure of CNN and Transformer is proposed.First, use CNN to extract image features as the input of the Transformer model.The input of the decoder is 6 languages that add language tags, perform word segmentation and Latinization.During training, the loss of different languages is used as the optimization goal, and the final realization is different Joint training between languages.Based on the Flickr8 K data set, the multilingual image caption generation data set containing 6 languages has been expanded and verified on this data set.The results show that the model can generate image captions in multiple languages at the same time, and the generation quality Compared with the single-language model of the same scale, the quality is higher, which verifies the effectiveness of the method.
作者
张大任
艾山·吾买尔
宜年
刘婉月
韩越
ZHANG Da-ren;AISHAN Wumaier;YI Nian;LIU Wan-yue;HAN Yue(College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China;Xinjiang Laboratory of Multi-Language Information Technology,Xinjiang University,Urumqi 830046,China)
出处
《东北师大学报(自然科学版)》
CAS
北大核心
2022年第2期68-75,共8页
Journal of Northeast Normal University(Natural Science Edition)
基金
国家语委科研项目(ZDI135-54)
国家重点研发计划项目(2017YFB1002103)。