期刊文献+

基于子音节表征的苗语语音合成方法

Sub-syllable Representation-based Hmong Language Text-to-Speech Method
下载PDF
导出
摘要 少数民族语言的语音合成有助于民族文化的传承、保护和发展,目前相关研究成果较少。针对不同声调的相同词发音相似时易出现语音合成错误的问题,提出了一种基于子音节表征的苗语语音合成方法,该方法利用子音节作为训练基元来表征苗语发音信息,以区分学习不同音节间的相似发音。根据文本序列和梅尔谱图之间对齐的单调性,引入单调对齐损失来指导注意力模块进行更准确的对齐学习,以减少因注意力机制的自回归性带来的跳词、重复等合成现象。为验证所提方法的有效性,以自建苗语语音合成语料库HmongSpeech(下载链接:http://sxjxsf.gzmu.edu.cn/info/1728/1214.htm)作为基准数据集,与典型的语音合成方法进行对比实验。实验结果表明,所提方法能够降低不同声调的相同词发音相似时导致的合成错误率,词错误率仅为0.96%,较基线方法改善了6.25%。 Speech synthesis of minority languages contributes to the preservation,protection and development of national culture,while the research results in this field are currently limited.To address the problem of speech synthesis errors where words with different tones sound similar,a sub-syllable representation-based text-to-speech method for the Hmong language was proposed.The method utilized sub-syllables as training primitives to accurately represent the pronunciation information of the Hmong language,enabling distinctive learning of similar sounds across different syllables.According to the monotonicity of alignment between text sequence and Mel-spectrogram,a monotonic alignment loss was introduced to guide the attention module to learn alignment more accurately,thereby reducing synthesis phenomena such as word skipping and repetition inherent in the autoregressive attention mechanism.To verify the effectiveness of the proposed method,a self-built Hmong language speech synthesis corpus,HmongSpeech(download link:http://sxjxsf.gzmu.edu.cn/info/1728/1214.htm),was utilized as the benchmark dataset.Comparative experiments were conducted with typical speech synthesis methods.The experimental results show that the proposed method successfully reduces the synthetic error rate caused by the similar pronunciation of words with different tones.Notably,the word error rate is only 0.96%,outperforming the baseline method by 6.25%.
作者 蔡姗 王林 谭棉 郭胜 吴磊 王飞 CAI Shan;WANG Lin;TAN Mian;GUO Sheng;WU Lei;WANG Fei(College of Data Science and Information Engineering,Guizhou Minzu University,Guiyang 550025,China;Key Laboratory of Pattern Recognition and Intelligent System of Guizhou Province,Guiyang 550025,China;College of Humanities&Sciences of Guizhou Minzu University,Guiyang 550025,China)
出处 《科学技术与工程》 北大核心 2024年第19期8176-8185,共10页 Science Technology and Engineering
基金 国家自然科学基金(62162012) 贵州省科技计划(黔科合基础-ZK[2022]一般195,黔科合基础-ZK[2023]一般143,黔科合平台人才-ZCKJ[2021]007) 贵州省教育厅自然科学研究项目(黔教技[2023]061号,黔教技[2023]012号,黔教技[2022]015号) 贵州省青年科技人才成长项目(黔教合KY字[2021]115,黔教合KY字[2021]110) 贵州省模式识别与智能系统重点实验室开放课题(GZMUKL[2022]KF01,GZMUKL[2022]KF05) 贵州省高层次创新型人才项目(黔科合平台人才-GCC[2023]027) 教育部产学合作协同育人项目(221001766110209)。
关键词 苗语语音合成 子音节 单调对齐 语料库 梅尔谱图 Hmong language text-to-speech sub-syllable monotonic alignment corpus Mel-spectrogram
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部