期刊文献+

采用STRAIGHT模型和深度信念网络的语音转换方法 被引量:4

Voice conversion using STRAIGHT model and deep belief networks
下载PDF
导出
摘要 提出一种将STRAIGHT模型和深度信念网络DBN相结合实现语音转换的方式。首先,通过STRAIGHT模型提取出源说话人和目标说话人的语音频谱参数,用提取的频谱参数分别训练两个DBN得到语音高阶空间的个性特征信息;然后,用人工神经网络ANN将两个具有高阶特征的空间连接并进行特征转换;最后,用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到语音频谱参数,并用STRAIGHT模型合成具有目标说话人个性化特征的语音。实验结果表明,采用此种方式获得的语音转换效果要比传统的采用GMM实现语音转换更好,转换后的语音音质和相似度与目标语音更接近。 We propose a new voice conversion method which combines the STRAIGHT model with deep belief networks. Firstly, we utilize the STRAIGHT model to extract the speech spectrum parame- ters of the source speaker and target speaker which are then used to train the two DBN spectrum param- eters, and obtain the voice characteristic information of the higher order space. Secondly, we can con- nect and convert the two high order spaces using the artificial neural networks (ANNs). Finally, we employ the DBN trained by the target speaker data to perform reverse processing on the converted fea- ture information, thus obtaining voice spectral parameters. Voice that has personalized features of the target speaker is synthesized by the STRSIGHT model. Experimental results show that compared with the traditional GMM based voice conversion method, the converted voice quality and voice similarity of the proposed method are closer to the target voice.
出处 《计算机工程与科学》 CSCD 北大核心 2016年第9期1950-1954,共5页 Computer Engineering & Science
基金 住房城乡建设部科学技术项目计划(2016-R2-045) 西安市碑林区2014年科技计划项目(GX1412)
关键词 语音转换 STRAIGHT模型 深度信念网络 高阶空间 voice conversion STRAIGHT model deep belief networks high order spaces
  • 相关文献

参考文献4

二级参考文献48

  • 1詹永照,曹鹏.语音情感特征提取和识别的研究与实现[J].江苏大学学报(自然科学版),2005,26(1):72-75. 被引量:16
  • 2Chin-Hui Lee,Mark A.Clements,Sorin Dusan.An Overview on Automatic Speech Attribute Transcription(ASAT) [C]// Conference on the International Speech Communication Association.Antwerp,Belgium;InterSpeech Express, 2007.1825-1828.
  • 3S.King,P.Taylor.Detection of phonological features in continuous speech recognition using neural networks[J]. Computer,Speech and Language,2000,14(4):333-353.
  • 4M.A.Siegler,R.M.Stern.On the effects of speech rate in large vocabulary speech recognition systems[C]// International Conference on Acoustics,Speech,and Signal Processing. Detroit,MI:ICASSP express,1995.612-615.
  • 5V.R.Gadde,K.Sonmez,H.Franco.Multirate ASR Models for Phone-class Dependent N-best List Rescoring [C]//IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU ).San Juan:IEEE express, 2005.157-161.
  • 6S.Dimopoulos,A.Potamianos,E.-F.Lussier,L.Chin-Hui. Multiple time resolution analysis of speech signal using MCE training with application to speech recognition [C]// International Conference on Acoustics,Speech, and Signal Processing.Tai Bei:IEEE express,2009. 3801-3804.
  • 7I-F Chen,Hsin-Min Wang.Articulatory Feature Asynchrony Analysis and Compensation in Detection-Based ASR//.International Speech Communication Association, Brighton United Kingdom,2009:3059-3062.
  • 8Zoltan Tuske,Christian Plahl,Ralf Schluter.A study on Speaker Normalized MLP Features in LVCSR[C]//Conference on the International Speech Communication Association. Florence,Italy,2011:1089-1092.
  • 9N.Strom,.“The NICO Artificial Neural Network Toolkit”, http://nico.nikkostrom.com.
  • 10Frantisek Grezl.Trap-Based Probabilistic Features For Automatic Speech Recognition[D].Brno,CZ:Brno University of Technology,2007.

共引文献30

同被引文献20

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部