期刊文献+

基于多域VQGAN的文本生成国画方法研究 被引量:1

Text-to-Chinese-painting Method Based on Multi-domain VQGAN
下载PDF
导出
摘要 随着生成式对抗网络的出现,从文本描述合成图像最近成为一个活跃的研究领域.然而,目前文本描述往往使用英文,生成的对象也大多是人脸和花鸟等,专门针对中文和中国画的研究较少.同时,文本生成图像任务往往需要大量标注好的图像文本对,制作数据集的代价昂贵.随着多模态预训练的出现与推进,使得能够以一种优化的方式来指导生成对抗网络的生成过程,大大减少了对数据集和计算资源的需求.提出一种多域VQGAN模型来同时生成多种域的中国画,并利用多模态预训练模型WenLan来计算生成图像和文本描述之间的距离损失,通过优化输入多域VQGAN的隐空间变量来达到图片与文本语义一致的效果.对模型进行了消融实验,详细比较了不同结构的多域VQGAN的FID及R-precisoin指标,并进行了用户调查研究.结果表示,使用完整的多域VQGAN模型在图像质量和文本图像语义一致性上均超过原VQGAN模型的生成结果. With the development of generative adversarial networks(GANs),synthesizing images from textual descriptions has become an active research area.However,textual descriptions used for image generation are often in English,and the generated objects are mostly faces,flowers,birds,etc.Few studies have been conducted on the generation of Chinese paintings with Chinese descriptions.The text-toimage generation often requires an enormous number of labeled image-text pairs,and the cost of dataset production is high.With the advance in multimodal pre-training,the GAN generation process can be guided in an optimized way,which significantly reduces the demand for datasets and computational resources.In this study,a multi-domain vector quatization generative adversarial network(VQGAN)model is proposed to simultaneously generate Chinese paintings in multiple domains.Furthermore,a multimodal pre-trained model WenLan is used to calculate the distance loss between generated images and textual descriptions.The semantic consistency between images and texts is achieved by optimization of the hidden space variables input into multi-domain VQGAN.Finally,an ablation experiment is conducted to compare different variants of multi-domain VQGAN in terms of the FID and R-precision metrics,and a user investigation is carried out.The results demonstrate that the complete multi-domain VQGAN model outperforms the original VQGAN model in terms of image quality and text-image semantic consistency.
作者 孙泽龙 杨国兴 温静远 费楠益 卢志武 文继荣 SUN Ze-Long;YANG Guo-Xing;WEN Jing-Yuan;FEI Nan-Yi;LU Zhi-Wu;WEN Ji-Rong(Gaoling School of Artificial Intelligence,Renmin University of China,Beijing 100872,China;School of Information,Renmin University of China,Beijing 100872,China)
出处 《软件学报》 EI CSCD 北大核心 2023年第5期2116-2133,共18页 Journal of Software
基金 国家自然科学基金(61976220,61832017) 北京高等学校卓越青年科学家计划(BJJWZYJH012019100020098)。
关键词 文本生成图像 多域生成 中国画生成 text-to-image generation multi-domain generation Chinese painting generation
  • 相关文献

参考文献3

二级参考文献60

  • 1Porter T, Duff T. Compositing digital images [ C ] //Proceedingsof the 11th annual conference on Computer graphics and interac-tive techniques. New York, USA:ACM, 1984:253-259.
  • 2Wang J, Cohen M F. Image and video matting : a survey [ J ].Foundations and Trends in Computer Graphics and Vision, 2007,3(2): 97-175.
  • 3Smith A, Blinn J. Blue screen matting [ C ]// Proceedings of268.
  • 4Ruzon M, Tomasi C. Alpha estimation in natural images [ C ] //Proceedings of IEEE CVPR 2000. Washington DC, USA: IEEEComputer Society, 2000:18-25.
  • 5Chuang Y, Curless B, Salesin D, et al. A bayesian approach todigital matting [ C ] //Proceedings of IEEE CVPR 2001. Washing-ton DC, USA : IEEE Computer Society, 2001:264-271.
  • 6Sindeyev M, Konushin V,Vezhnevets V. Improvements ofbayesian matting [ C ] //Proceedings of Graphicon 2007. Moscow,Russia: Moscow State Lomonosov University, 2007:88-95.
  • 7Berman A, Vlahos P, Dadourian A. Comprehensive method forremoving from an image the background surrounding a selectedobject. US, 6135345[P]. 2000-10-17.
  • 8Wang J,Cohen M. An iterative optimization approach for unifiedimage segmentation and matting [ C ] //Proceedings of IEEE ICCV2005.Washington DC, USA: IEEE Computer Society, 2005:936-943.
  • 9Weiss Y,Freeman W. On the optimality of solutions of the max-product belief propagation algorithm in arebitrary graphs [ J ]. IEEETransaction on Information Theory, 2001, 47(2) : 303-308.
  • 10Wang J, Cohen M. Optimized color sampling for robust matting[C ] //Proceedings of IEEE CVPR 2007. Washington DC,USA :IEEE Computer Society, 2007:1-8.

共引文献16

同被引文献7

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部