期刊文献+

基于Transformer的多语种字音转换

Transformer Based Multilingual Grapheme-to-Phoneme Conversion
下载PDF
导出
摘要 字音转换(Grapheme-to-Phoneme, G2P)是语音合成前端的重要部分,影响着语音合成的质量。现如今,大多数的字音转换的研究是针对于单一语种的,而在实际应用中,单一语种合成的语音远没有多语种的实用性高。因此,本文利用Transformer架构研究了在文本交叉混合条件下多语种(英、日、韩)的字音转换,使用音素错误率(Phoneme Error Rate, PER)和单词错误率(Word Error Rate, WER)作为评价指标。英文在基于美国英语的CMUDict数据集进行评估,韩语和日语则是先对SIGMORPHON 2021字音转换任务上的韩语及日语数据集进行了数据扩充,并在扩充后的数据集上进行评估。实验结果表明,在文本交叉混合条件下,基于Transformer架构的英、日、韩字音转换在音素错误率和单词错误率方面与基于Transformer架构的英、日、韩三个语言的单一语种相比都大大降低了。 Grapheme-to-Phoneme (G2P) conversion is an important part of the front end of speech synthesis, which affects the quality of speech synthesis. Nowadays, most of the research on G2P conversion is aimed at a single language, and in practical applications, single-language synthesized speech is far less practical than multilingual. Therefore, this paper uses the Transformer architecture to study the G2P conversion of multiple languages (English, Japanese, and Korean) under the condition of text crossmixing, and uses Phoneme Error Rate (PER) and Word Error Rate (WER) as evaluation indicators. English is evaluated on the CMUDict dataset based on American English, while Korean and Japanese are first expanded on the Korean and Japanese data set on the SIGMORPHON 2021 G2P conversion task, and then evaluated on the expanded data set. Experimental results show that under the condition of text crossmixing, the phoneme error rate and word error rate of English, Japanese and Korean characters based on Transformer architecture are greatly reduced compared with the single language of English, Japanese and Korean based on Transformer architecture.
出处 《计算机科学与应用》 2023年第3期510-517,共8页 Computer Science and Application
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部