基于多谱特征生成对抗网络的语音转换算法被引量：4

A voice conversion algorithm based on multi-spectral feature generative adversarial network

下载PDF

导出

摘要语音转换在教育、娱乐、医疗等各个领域都有广泛的应用,为了得到高质量的转换语音,提出了基于多谱特征生成对抗网络的语音转换算法。利用生成对抗网络对由谱特征参数生成的声纹图进行转换,利用特征级多模态融合技术使网络学习来自不同特征域的多种信息,以提高网络对语音信号的感知能力,从而得到具有良好清晰度和可懂度的高质量转换语音。实验结果表明,在主、客观评价指标上,本文算法较传统算法均有明显提升。 Voice conversion is widely used in education,entertainment,medical and other fields.In order to obtain high-quality converted speech,this paper proposes a voice conversion algorithm based on multi-spectral feature generative adversarial network.It uses generative adversarial network to convert the voiceprint obtained by spectral feature parameters.The feature-level multimodal fusion technique is used to make the network learn multiple spectral feature information from different feature domains,so as to improve the perception of speech signals of the network.Finally,the high-quality converted speech with good definition and intelligibility is obtained.The experimental results show that the proposed algorithm is significantly superior to the traditional algorithms in the subjective and objective evaluation indicators.

作者张筱张巍王文浩万永菁 ZHANG Xiao;ZHANG Wei;WANG Wen-hao;WAN Yong-jing(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)

机构地区华东理工大学信息科学与工程学院

出处《计算机工程与科学》 CSCD 北大核心 2020年第5期893-901,共9页 Computer Engineering & Science

关键词语音转换声纹图生成对抗网络多谱特征跨域重建误差 voice conversion voiceprint generative adversarial network multi-spectral feature cross-domain reconstruction error

分类号 TP391.42 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1张正军,杨卫英,陈赞.基于STRAIGHT模型和人工神经网络的语音转换[J].电声技术,2010,34(9):49-52. 被引量：5
2王民,苏利博,王稚慧,要趁红.采用STRAIGHT模型和深度信念网络的语音转换方法[J].计算机工程与科学,2016,38(9):1950-1954. 被引量：4

二级参考文献17

1ABE M, NAKAMURA S, SHIKANO K, et al. Voice conversion through vector quantization[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Vol 1. New York :IEEE Press, 1988 : 655-658.
2TODA T, BLACK A W, TOKUDA K. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Vol 1. Philadelphia:IEEE Press, 2005:9-12.
3ERRO Daniel , MORENO Asuncion. Voice conversion based on weighted frequency warping[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.Barcelona: IEEE Press,2009:922-931.
4NARENDRANATH M, MURTHY H A, RAJENDRAN S, et al. Transformation of formants for voice conversion using artificial neural networks[J]. Speech Communication, 1995, 16:207-216.
5BAUDOIN G, STYLINAOU Y. On the transformation of the speech spectrum for voice conversion[C]//Proeeedings of ICSLP'96, Vol 3. Philadelphia:IEEE Press. 1996:1405- 1408.
6WATANABE T, MURAKAMI T, NAMBA M, et al. Transformation of spectral envelope for voice conversion based on radial basis function network[C]//Proceedings of International Conference on Spoken Language Processing, 2002. Denver: IEEE Press,2002: 285-288.
7DESAI S, RAGHAVENDRA E V, YEGNANARAYANA B, et al.Voice conversion using artificial neural networks [C]// Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.Taipei:IEEE Press, 2009:3893-3896.
8IRINO T, MINAMI Y, NAKATANI T, et al. Evaluation of a speech recognition/generation method based on hmm and straight[C]//Proceedings of the ICSLP, 2002. Dunedin:IEEE Press, 2002.
9KAWAHARA H. Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited[C]//Technical Report of IEICE. Wakayama:[s.n.], 1996:9-16.
10KAWAHARA H. Restructuring speech representations using a pitch adaptive time frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds[J]. Speech Communication, 1999,2:1303-1306.

共引文献7

1潘梦鹞,吕小勇,陈少伟,郇锐铁,王锋.基于AI智能语音技术线上教学的创新与实践[J].创新创业理论研究与实践,2022(24):170-173. 被引量：1
2CHEN Xian-tong,ZHANG Ling-hua.High-quality voice conversion system based on GMM statistical parameters and RBF neural network[J].The Journal of China Universities of Posts and Telecommunications,2014,21(5):68-75. 被引量：3
3周纯静,杨卫英.利用声道归一化提高语音转换效果的方法[J].电声技术,2014,38(7):42-46.
4刘永俊,张立飞,刘巍.面向噪声环境下医疗语音信号端点检测方法[J].常熟理工学院学报,2017,31(4):75-79. 被引量：1
5祝琼珂,王光艳,江淇,罗雨章.基于STRAIGHT模型的语音转换系统研究[J].山西科技,2020,35(5):60-66.
6庞聪,连海伦,周健,王华彬,陶亮.一种基于特征融合的耳语音向正常音的转换方法[J].南京航空航天大学学报,2020,52(5):777-782.
7王光艳,高丽萍,黄奕婷,于宝雲.基于STRAIGHT模型和ANN的语音转换方法研究[J].新一代信息技术,2020,3(22):12-18.

同被引文献31

1夏玲,李宜蔓,李弘武.人工智能背景下科技论文摘要的机器翻译与译后编辑[J].编辑学报,2022,34(4):396-401. 被引量：12
2吴凤平,李斌,王文萱.建筑工程项目成本控制的研究[J].建筑管理现代化,2008,22(1):17-20. 被引量：9
3赵新乐,陈光,肖作江,翁占坤.履带式车辆内部噪声特性分析[J].长春工业大学学报,2008,29(5):562-565. 被引量：3
4温和,滕召胜,郭斯羽,王璟珣,杨步明,王一,陈桃.Hanning自卷积窗函数及其谐波分析应用[J].中国科学（E辑）,2009,39(6):1190-1198. 被引量：15
5戴琼海,付长军,季向阳.压缩感知研究[J].计算机学报,2011,34(3):425-434. 被引量：216
6冯正权,何庆华,朱新建,闫庆广,高丹丹,吴宝明.基于情景姿态的帕金森病患的肢体失衡与震颤检测系统研究[J].中国医学物理学杂志,2012,29(3):3434-3437. 被引量：1
7冯璐,陈威兵,吴宇.基于语音拖音段的端点检测算法研究[J].计算机工程与科学,2012,34(10):187-191. 被引量：1
8黄志钢,宋春雷,宋玉,吴庆涛.基于音频信号的汽车状态与故障分析[J].沈阳理工大学学报,2012,31(5):14-19. 被引量：4
9储雯,李银国,徐洋,孟祥涛.基于段级特征主成分分析的说话人识别算法[J].计算机应用,2013,33(7):1935-1937. 被引量：4
10王民,曹清菁,贠卫国,周军妮.改进MFCC算法在朱鹮鸣声个体识别中的应用[J].计算机工程与科学,2016,38(5):1052-1056. 被引量：4

引证文献4

1刘文才,姚凯学,杨乘.基于音频特征的工程车辆工况识别研究[J].计算机工程与科学,2022,44(8):1497-1505. 被引量：1
2侯晓丽,赵雅,严慧深,程宏.基于深度LSTM残差网络的帕金森症诊断方法[J].中国医学物理学杂志,2023,40(5):609-615.
3王翠英.基于深度学习的合成语音转换问题研究[J].自动化与仪器仪表,2023(7):196-200. 被引量：2
4冯天宇,朱永华.基于生成对抗网络数据增强的抗噪语音识别系统[J].上海大学学报（自然科学版）,2024,30(3):476-490.

二级引证文献3

1王荣香,朱俊峰,陈仪生.一种基于音频特征分析的医疗设备安全监测系统[J].医疗装备,2023,36(5):1-4.
2李发娟.智能声纹识别技术在高校英语口语考试系统中的应用研究[J].电声技术,2024,48(5):28-30.
3舒蜜,龙荣平.基于模块化的山洪灾害预警音频数字功放系统设计[J].气象研究与应用,2024,45(2):83-87.

1田勇.高校教育资源建设在研究生培养中的高质量转换初探[J].黑龙江教育（高教研究与评估）,2020,0(2):66-70. 被引量：1
2马瑞园.解读弹簧秤[J].中学生数理化（八年级物理）（人教版）,2020(1):48-49.
3蓝天,彭川,李森,叶文政,李萌,惠国强,吕忆蓝,钱宇欣,刘峤.单声道语音降噪与去混响研究综述[J].计算机研究与发展,2020,57(5):928-953. 被引量：17

计算机工程与科学

2020年第5期

浏览历史

内容加载中请稍等...

基于多谱特征生成对抗网络的语音转换算法被引量：4

参考文献2

二级参考文献17

共引文献7

同被引文献31

引证文献4

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于多谱特征生成对抗网络的语音转换算法 被引量：4

参考文献2

二级参考文献17

共引文献7

同被引文献31

引证文献4

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于多谱特征生成对抗网络的语音转换算法被引量：4