摘要
如何在保证风格不变的情况下将图片中的外文替换为中文是一个有趣并富有挑战的问题。为此,针对图像中文本的跨语言转换提出一种预训练视觉翻译技术,结合文字检测、字体识别、OCR、图像修复、机器翻译及图像渲染技术构建跨模态自适应互译渲染模型,以保持原文风格和排版样式。首先使用EAST算法定位并提取文字区域;其次采用ResNet识别字体样式,CTC-OCR提取文字内容并由GPT模型进行翻译;最后由LaMa算法修复文字区域后,采用区域坐标渲染算法将翻译文字融入修复图像,实现高质量视觉翻译。由评估员对翻译效果进行定量评估,该方法主观评估分数达到7.90,具有较高准确性。
How to replace foreign language in images with Chinese while maintaining the same style is an interesting and challenging problem.To this end,a pre trained visual translation technique is proposed for cross language conversion of text in images to maintain the original text style and layout style.Build a cross modal adaptive translation rendering model by combining text detection,font recognition,OCR,image res-toration,machine translation,and image rendering technologies.Firstly,use EAST algorithm to locate and extract text regions;Then,ResNet is used to recognize font styles,while CTC-OCR extracts text content and translates it into GPT;Finally,after repairing the text area using the LaMa algorithm,the region coordinate rendering algorithm is used to integrate the translated text into the repaired image,achieving high-qual-ity visual translation.The method of quantitatively evaluating translation effectiveness by evaluators has a subjective evaluation score of 7.90,indicating high accuracy.
作者
屈梦楠
靳宇浩
胡勃宁
QU Mengnan;JIN Yuhao;HU Boning(School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang 050018,China)
出处
《软件导刊》
2024年第6期59-66,共8页
Software Guide
关键词
视觉翻译
多模态
GPT
中文翻译
神经网络
visual translation
multi-modal
GPT
Chinese translation
neural network