一种稀少训练数据条件下的语音转换算法被引量：1

A Voice Conversion Algorithm in the Context of Sparse Training Data

下载PDF

导出

摘要提出了一种新的语音说话人转换算法,利用变分贝叶斯方法估计高斯混合模型参数,进而将其应用于语音转换的声道谱参数映射过程,实现说话人身份转换。将变分贝叶斯算法用于模型参数的估计,一方面解决了训练数据量稀少情况下容易使模型产生"过拟合"的问题,另一方面通过将模型参数概率化,使得参数估计问题不再是"点估计",而成为了"全局估计",因此在一定程度上提高了模型的精度。主观和客观实验结果表明:将基于变分贝叶斯估计得到的统计模型用于语音声道谱参数的转换,明显提高了在训练数据稀少的情况下系统的鲁棒性,同时转换后语音的音质和说话人个性特征均优于经典的语音转换系统。 A new voice speaker conversion algorithm is proposed.The algorithm evaluates the parameters of Gaussian mixture model（GMM） by Variational Bayesian（VB） theory and applies it to the track spectral parameter mapping processfor voice conversion（VC） to realize the speaker conversion.The advantage of introducing VB into VC community lies in its ability to overcoming the over-fitting problem when the training data is not sufficient.Moreover,using the probability based evaluation approach,the parameters are estimated globally instead of by point estimation.It makes VB more accurate than the traditional ones such as Maximum Likelihood（ML） or Maximum a Posterior（MAP）.Subjective and objective evaluation both demonstrate that the proposed algorithm based on VB works quite well,especially when the training data is sparse.In addition,the quality and the speaker individuality of the converted speech feels much better in comparison to the traditional VC system.

作者徐宁杨震

机构地区南京邮电大学通信与信息工程学院南京邮电大学信号处理与传输研究院

出处《南京邮电大学学报（自然科学版）》 2010年第5期1-7,共7页 Journal of Nanjing University of Posts and Telecommunications：Natural Science Edition

关键词变分贝叶斯估计高斯混合模型语音转换声道谱参数稀少训练数据 variational Bayesian Gaussian mixture model voice conversion spectral mapping sparse data

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献11

1KUWABARA H, SAGISAKA Y. Acoustic characteristics of speaker individuality : control and conversion [J]. Speech Communication, 1995,16(2) :165 - 173.
2ABE M, NAKAMURA S, SHIKANO K, et al. Voice conversion through vector quantization [ C ] // IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 1988 : 655 - 658.
3VALBRET H, MOULINES E,TUBACH J P. Voice transformation using PSOLA technique [ J ]. Speech Communication, 1992,1 ( 1 ) : 145 - 148.
4NARENDRANATH M, MURTHY H A, RAJENDRAN S, et al. Transformation of formants for voice conversion using artificial neural networks [ J ]. Speech Communication, 1995,16 (5) :207 - 216.
5STYLIANOU Y,CAPPE O,MOULINES E. Continuous probabilistic transform for voice conversion [ J ]. IEEE Transactions on Speech and Audio Processing, 1998,6( 2 ) : 131 - 142.
6KAIN A. High resolution voice transformation [ D ]. PhD Thesis of OGI School of Science and Engineering,2001.
7TODA T, SARUWATARI H, SHIKANO K. Voice conversion algorithm based on Gaussian Mixture Model with dynamic frequency warping of STRAIGHT spectrum [ C ]//ICSSAP. Piscataway : IEEE, 2001:841 - 844.
8JIAN Zhihua,YANG Zhen. Voice conversion using canonical correlation analysis based on Gaussian Mixture Model [ C ]//IEEE 8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. Piscataway : IEEE, 2006 : 210 - 215.
9JIAN Zhihua, YANG Zhen. Voice conversion using Viterbi algorithm based on Gaussian mixture model [ C ] //IEEE International Symposium on Intelligent Signal Processing and Communication Systems. Piscataway : IEEE,2007:764 - 767.
10ATTIAS H. A variational bayesian framework for graphical models [ C ]// Advances in Neural Information Processing Systems. Cambridge : MIT Press,2000 : 1 - 8.

同被引文献4

1吕品轩,王士林,李生红.基于唇语识别的特征鉴别力分析[J].信息安全与通信保密,2008,30(5):60-62. 被引量：5
2张辰,杨文柱,刘召海.基于HSV综合显著性的彩色图像分割方法[J].计算机工程与设计,2013,34(11):3944-3947. 被引量：19
3卢开宏.基于唇部视觉特征的语言分类识别研究[J].信息技术与信息化,2015(7):48-50. 被引量：1
4任玉强,田国栋,周祥东,吕江靖,周曦.高安全性人脸识别系统中的唇语识别算法研究[J].计算机应用研究,2017,34(4):1221-1225. 被引量：19

引证文献1

1张亚飞,耿梦伟,尹玉倩,齐立萍.基于人工智能的唇语识别技术[J].信息与电脑,2018,30(10):121-122.

1李蕾,于宏毅,袁佳.基于高阶累积量的MSK信号脉冲始末点检测[J].数据采集与处理,2006,21(B12):128-131.
2王斌,张振宇,杨文忠,吴晓红.一种机会网络拥塞控制策略[J].激光杂志,2015,36(9):129-133.
3徐定杰,沈忱,沈锋.鲁棒化的变分贝叶斯自适应卡尔曼滤波算法[J].华中科技大学学报（自然科学版）,2013,41(11):128-132. 被引量：3
4范文婧.DVB-T接收端采样钟同步的FPGA实现[J].中国集成电路,2006,15(6):28-31.
5占毅,胡东辉,雷斌.改进的SAR图像干扰抑制算法[J].电子测量技术,2012,35(4):90-94. 被引量：3
6马飞,王金明,朱森.基于深度卷积神经网络的连续语音识别研究[J].军事通信技术,2016,37(4):37-40. 被引量：4
7陈建勇,王树宗.基于RBF神经网络的组合导航融合算法[J].数据采集与处理,2006,21(2):198-202. 被引量：3
8朱翠涛,杨凡.基于变分稀疏贝叶斯学习的频谱检测方法[J].中南民族大学学报（自然科学版）,2013,32(1):65-69. 被引量：3
9石敏,李影,王冰,武英杰.基于变分模态分解的齿轮箱故障诊断[J].电力科学与工程,2016,32(1):23-26. 被引量：17
10李小玉,何怡刚,李目,方葛丰.基于小波分析和遗传神经网络的模拟电路故障诊断方法[J].计算机应用研究,2011,28(12):4517-4519. 被引量：8

南京邮电大学学报（自然科学版）

2010年第5期

浏览历史

内容加载中请稍等...

一种稀少训练数据条件下的语音转换算法被引量：1

参考文献11

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种稀少训练数据条件下的语音转换算法 被引量：1

参考文献11

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种稀少训练数据条件下的语音转换算法被引量：1