考虑帧间信息的语音转换算法

A Voice Conversion Algorithm Considering for Inter-frame Information

下载PDF

导出

摘要传统的加权频率卷绕算法是单独地对每帧语音特征参数进行转换,没有考虑到语音帧前后的相关信息。针对这一点,该文提出了一种改进的加权频率卷绕算法,它利用压缩感知理论提取语音信号的帧间相关信息。在进行转换时,该算法是相当于对语音段进行转换。客观测试和主观听觉评测表明,虽然改进后算法的性能会受到语音段长度的影响,但当选择合适语音段长度时,性能要好于传统的加权频率卷绕算法。 The traditional conversion algorithm, weighted frequency warping（ WFW）, converted the speaker identity feature frame-by-frame and did not take account of the contextual information existing over a speech sequence. To solve the problem, this paper proposed a modified version of the WFW called modified weighted frequency warping（MWFW） which utilized compressed sensing（CS） to capture the useful information between continuous frames. Instead of transforming the speech features frame-independently, the MWFW did it seg- ment-by-segment. Both object and subject evaluations were conducted. The experimental results demonstrated that the performance of MWFW was dependent on the length of speech segment. When choosing the appropri- ate length of speech segment, our approach can achieve better performance than WFW.

作者简志华王向文

机构地区杭州电子科技大学通信工程学院

出处《杭州电子科技大学学报（自然科学版）》 2012年第4期33-36,共4页 Journal of Hangzhou Dianzi University：Natural Sciences

基金浙江省自然科学基金资助项目(Y1101040) 浙江省教育厅科研资助项目(Y201016542)

关键词语音转换压缩感知频率卷绕高斯混合模型 voice conversion compressed sensing frequency warping GMM

分类号 TN911.23 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献9

1左国玉,刘文举,阮晓钢.声音转换技术的研究与进展[J].电子学报,2004,32(7):1165-1172. 被引量：32
2Abe M, Nakamura S, Shikano K, et al. Voice conversion through vector quantization [ C ]. New York: IEEE Internation- al Conference on Acoustic Speech and Signal Processing, 1988:655 -658.
3Stylianou Y, Cappe O, Moulines E. Continuous probabilistic transform for voice conversion [ J ]. IEEE Transactions on Speech and Audio Processing, 1998, 6(2) :131 -142.
4Kain A, Macon M W. Design and evaluation of a voice conversion algorithm based on spectral envelop mapping and resid- ual prediction [ C ]. Salt Lake City: IEEE International Conference on Acoustic Speech and Signal Processing, 2001:813 -816.
5Pribilova A, Pribil J. Non-linear frequency scale mapping voice conversion in text-to-speech system with cepstral descrip- tion [ J ]. Speech Communication, 2006, 48 (12) : 1 691 - 1 703.
6Erro D, Moreno A, Bonafonte A. Voice conversion based on weighted frequency warping [ J ]. IEEE Transactions on Au- dio Speech and Language Processing, 2010, 18(5) :922-931.
7Tropp J A, Gilbert A C. Signal recovery from random measurements via orthogonal matching pursuit [ J ]. IEEE Transac- tions on Information Theory, 2007, 53 (12) :4 655 -4 666.
8Kawahara H, Masuda-Katsuse I, Cheveigne A. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds [ J ]. Speech Communication, 1999, 27 (3) : 187 - 207.
9Ye Hui, Young S. Quality-enhanced voice morphing using maximum likelihood transformations [ J ]. IEEE Transactions on Audio Speech and Language Processing, 2006, 14(4) : 1 301 - 1 312.

二级参考文献56

1H Kuwabara and Y Sagisaka.Acoustic characteristics of speaker individuality:control and conversion[J].Speech Communication.1995,16(2):165-173.
2D Klatt and L C Klatt.Analysis,synthesis,and perception of voice quality variations among female and male talkers[J].J Acoust Soc Am,1990,87(2):820-857.
3P H Milenkovic.Voice source model for continuous control of pitch period[J].J Acoust Soc Am,1993,93(2):1087-1096.
4H Matsumoto,et al.Multidimensional representation of personal quality of vowels and its acoustical correlates[J].IEEE Trans Audio and Electroacoustics,1973,21(5):428-436.
5S Furui.Research on individuality features in speech waves and automatic speaker recognition techniques [J].Speech Communication,1986,5(2):183-197.
6K S Lee,et al.A new voice transformation based on both linear and nonlinear prediction[A].Proc ICSLP[C].Philadelphia,USA:ESCA,1996.1401-1404.
7L M Arslan.Speaker transformation algorithm using segmental codebooks (STASC)[J].Speech Communication,1999,28(3):211-226.
8H Mizuno and M Abe.Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt[J].Speech Communication.1995,16(2):165-173.
9T Yoshimura,et al.Speaker interpolation in HMM-based speech synthesis system[A].Proc.Eurospeech [C].Rhodes,Greece:ESCA,1997.2523-2526.
10D G Childers.Glottal source modeling for voice conversion [J].Speech Communication.1995,16 (2):127-138.

共引文献31

1吴梅,冯瑞杰.试论一种语音转换系统的设计与实现[J].中亚信息,2010(S1):61-63.
2左国玉,刘文举,阮晓钢.语音转换技术在电话语音识别中的应用研究(英文)[J].系统仿真学报,2005,17(2):448-452.
3左国玉,刘文举,阮晓钢.一种使用声调映射码本的汉语声音转换方法[J].数据采集与处理,2005,20(2):144-149. 被引量：4
4符敏,程德福.支持向量回归在声音转换中的应用[J].电声技术,2006,30(3):45-48. 被引量：1
5张晓洲,黄德智,蔡莲红.考虑帧间动态特征的音色变换算法[J].清华大学学报（自然科学版）,2006,46(10):1767-1770. 被引量：1
6康永国,双志伟,陶建华,张维.基于混合映射模型的语音转换算法研究[J].声学学报,2006,31(6):555-562. 被引量：13
7王海祥,戴蓓蒨,陆伟,张剑.基于共振峰参数和分类线性加权的源-目标声音转换[J].中国科学技术大学学报,2006,36(11):1153-1159.
8王海祥.基于RBF神经网络的源——目标话音转换[J].电子测量技术,2006,29(6):60-63.
9孙俊,戴蓓蒨,张剑.基于基元段特征和GMM的源-目标说话人F_0～t转换[J].信号处理,2007,23(2):283-287.
10王卉,王小军,马骏.基于CMOS工艺的音频前置放大器的设计与实现[J].电子器件,2007,30(3):870-873.

1戴加宁.语音帧间相关信息对基于HMM系统识别精度的影响[J].电子学报,1997,25(7):75-77.
2杜文超,董其义,李振宇,王在铎.天水线在识别红外舰船图像目标中的应用[J].国外电子测量技术,2005,24(7):46-49. 被引量：3
3吴伟,唐斌.基于Laguerre变换的宽带LFM信号频谱压缩数字接收方法[J].电子与信息学报,2007,29(1):50-53.
4祁小平,张启衡.基于梯度变化分析的弱目标检测[J].激光与红外,2004,34(6):487-489.
5赵力,邹采荣,吴镇扬.一种引入帧间相关信息的HMM语音识别方法[J].电子与信息学报,2001,23(4):327-331. 被引量：2
6孙慧平,刘党辉,沈兰荪.基于DCT压缩域的快速字符定位算法研究[J].电子学报,2006,34(4):751-754. 被引量：4
7祁小平,张启衡.运动弱目标检测的一种新方法[J].半导体光电,2004,25(6):463-464. 被引量：1
8王祯飞.语音信号模块化预处理的分析实现[J].科技风,2010(20):203-204.
9范绍兴,叶澄清.适合网络传输的鲁棒小波低比特率视频编码[J].计算机研究与发展,2001,38(7):876-881.
10孙慧平,刘党辉,沈兰荪.一种新的DCT压缩域字符快速定位算法[J].测控技术,2005,24(5):48-51. 被引量：2

杭州电子科技大学学报（自然科学版）

2012年第4期

浏览历史

内容加载中请稍等...

考虑帧间信息的语音转换算法

参考文献9

二级参考文献56

共引文献31

相关作者

相关机构

相关主题

浏览历史