期刊文献+

端到端闽南语合成系统的设计与实现 被引量:3

Design and implementation of end-to-end synthetic system for Hokkien
下载PDF
导出
摘要 为了更好地研究语音合成在闽南语上的应用,建立了闽南语数据库,并验证了Tacotron2为有效的语音合成模型.数据库方面,建立起地方特色的闽南语词库和音素体系;模型框架方面,在Tacotron和Tacotron2以及结合了两者不同模块的融合框架上进行实验对比.在厦门大学自主采集的厦门口音闽南语数据集的基础上,使用闽南语识别模型对语音数据进行解码得到对应的带有标点符号的音素序列,通过专业定制的词典建立音素标注体系,进行多组实验,比较采样率、建模方式和模型结构对合成音质以及稳定性的影响,通过梅尔谱和编码解码对齐图等评测标准,得到了三者的最佳搭配方案. To better study the application of speech synthesis in Hokkien,we have established the database and verified that Tacotron2 is an effective speech synthesis model.For database,the establishment of a comprehensive and localized Hokkien vocabulary and phoneme system is adopted;for the model,in the model architecture of Tacotron and Tacotron2,the integration and optimization of the two models are explored,and the attention mechanism and other modules are optimized.On the data set of Xiamen pronunciation of Hokkien,the corresponding phoneme sequences with punctuation marks are decoded by Hokkien recognition model.A post-phoneme annotation system is established through a specially customized dictionary.A series of experiments are carried out to compare effects of sampling rate,modeling method and model structure on the synthesized phoneme quality and stability.Through Mel spectrum and alignment map of decoding and encoding,the best configuration is obtained.
作者 颜世江 陈越 颜婉玲 许彬彬 李琳 洪青阳 YAN Shijiang;CHEN Yue;YAN Wanling;XU Binbin;LI Lin;HONG Qingyang(School of Informatics,Xiamen University,Xiamen 361005,China;School of Humanities,Xiamen University,Xiamen 361005,China;School of Electronic Science and Engineering,Xiamen University,Xiamen 361005,China)
出处 《厦门大学学报(自然科学版)》 CAS CSCD 北大核心 2020年第6期988-994,共7页 Journal of Xiamen University:Natural Science
基金 国家自然科学基金(61876160)。
关键词 语音合成 端到端模型 深度学习 闽南语 speech synthesis end-to-end model deep learning Hokkien
  • 相关文献

参考文献2

二级参考文献8

  • 1黄德智,蔡莲红.一种面向声音变换的参数化模型[J].声学学报,2006,31(6):542-548. 被引量:2
  • 2曾志雄.一种有效的基于划分和层次的混合聚类算法[J].计算机应用,2007,27(7):1692-1694. 被引量:15
  • 3Hideki Banno,,Hiroaki Hata,Masanori Morise,et al.Implementation of realtime STRAIGHT speech manipulationsystem:Report on its first implementation. AcousticScience and Technology . 2007
  • 4Hideki Kawahara,Alain de Cheveigne,Hideki Banno,et al.Nearly defect-free F0 trajectory extraction for expressivespeech modifications based on STRAIGHT. ProcInterspeech2005 . 2005
  • 5Tseng C.The syllable duration varies a lot with differenttones. Speech Communication . 2005
  • 6Han J,Kamber M.The concept and Technology of DataMining. . 2001
  • 7Mohri M,Pereira F,Riley M.Weighted finite-state transducers in speechrecognition. Computer Speech and Language . 2002
  • 8李净,郑方,张继勇,吴文虎.汉语连续语音识别中上下文相关的声韵母建模[J].清华大学学报(自然科学版),2004,44(1):61-64. 被引量:18

共引文献8

同被引文献21

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部