Research of whispered speech vocal tract system conversion based on universal background model and effective Gaussian components 被引量：1

Research of whispered speech vocal tract system conversion based on universal background model and effective Gaussian components

原文传递

导出

摘要 Directing to the weakness of the present fixed values mapping methods （method_F）, a vocal tract system conversion method based on the universal background model （UBM） is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can＇t be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method （method_U） is analyzed and compared using the performance index （PI） based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech. Directing to the weakness of the present fixed values mapping methods （method_F）, a vocal tract system conversion method based on the universal background model （UBM） is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can＇t be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method （method_U） is analyzed and compared using the performance index （PI） based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.

作者 CHEN Xueqin ZHAO Heming

机构地区 School of Electronic and Information Engineering

出处《Chinese Journal of Acoustics》 2013年第4期400-410,共11页 声学学报（英文版）

基金 supported by the National Natural Science Foundation of China(61071215) the Science and Technology Foundation of Suzhou(SYG201033) the Pre-research Foundation of Soochow University(Q311901111,14317399)

关键词 Research of whispered speech vocal tract system conversion based on universal background model and effective Gaussian components UBM

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献2

1王敏,赵鹤鸣.基于多带解调分析和瞬时频率估计的耳语音话者识别[J].声学学报,2010,35(4):471-476. 被引量：12
2康永国,双志伟,陶建华,张维.基于混合映射模型的语音转换算法研究[J].声学学报,2006,31(6):555-562. 被引量：13

二级参考文献44

1左国玉,刘文举,阮晓钢.声音转换技术的研究与进展[J].电子学报,2004,32(7):1165-1172. 被引量：32
2栗学丽,丁慧,徐柏龄.基于熵函数的耳语音声韵分割法[J].声学学报,2005,30(1):69-75. 被引量：34
3林玮,杨莉莉,徐柏龄.基于修正MFCC参数汉语耳语音的话者识别[J].南京大学学报（自然科学版）,2006,42(1):54-62. 被引量：23
4Taisuke Ito,Kazuya Takeda,Fumitada Itakura.Analysis and recognition of whispered speech.Speech Communication,2005; 45(2):139-152.
5Chi Zhang,Hansen J H L.Analysis and classification of speech mode:whispered through shouted.INTER-SPEECH,2007:2289-2298.
6Jin Q,Jou S S,Schultz T.Wisphering speaker identification.IEEE ICME,2007:1021-1024.
7FAN Xing,Hansen J H L.Speaker identification for whispered speech based on frequency warping and score competition.INTERSPEECH,2008:1313-1316.
8Teager H M,Teager S M.Some observation on oral airflow during phonation.IEEE Trans on Acoustic,Speech,and Signal Processing,1980; 28(5):599-601.
9Maragos P,Kaiser J F,Quatieri T F.Energy separation in signal modulations with application to speech analysis.IEEE Trans on Signal Process,1993; 40(10):3024-3051.
10Bovik A C,Maragos P,Quatieri T F.AM-FM energy detection and separation in noise using multiband energy operators.IEEE Trans on Signal Processing,1993; 41(12):3245-3265.

共引文献22

1赵义正.改进GMM谱包络转换性能的语音转换算法研究[J].科学技术与工程,2010,10(17):4172-4174. 被引量：3
2赵义正.一种新的分维高斯混合模型语音转换方法[J].计算机与现代化,2010(9):82-84.
3李燕萍,张玲华,丁辉.基于音素分类的汉语语声转换算法[J].南京邮电大学学报（自然科学版）,2011,31(1):10-15. 被引量：1
4黄永明,章国宝,李雄,达飞鹏.全局特征及弱尺度融合策略的小样本语音情感识别[J].声学学报,2012,37(3):330-338. 被引量：9
5赵义正.一种改进高斯混合模型均值项的语音转换方法[J].微型机与应用,2012,31(19):68-70.
6张潇丹,包永强,奚吉,赵力,邹采荣.基于MD-CM-SFLA神经网络的耳语音情感识别[J].东南大学学报（自然科学版）,2012,42(5):848-853. 被引量：2
7陈雪勤,赵鹤鸣.有效高斯分量通用背景模型下耳语音声道系统转换研究[J].声学学报,2013,38(2):195-200. 被引量：5
8解伟超,张玲华.基于自组织聚类和改进粒子群算法的语音转换方法[J].声学学报,2014,39(1):130-136. 被引量：1
9龚呈卉,赵鹤鸣,陶智,张庆芳.全局谱参数下的耳语说话人状态因子分析[J].声学学报,2014,39(2):281-288. 被引量：1
10简志华,王向文.采用压缩感知的改进的语音转换算法[J].声学学报,2014,39(3):400-406. 被引量：5

同被引文献16

1韩文静,李海峰,韩纪庆.基于长短时特征融合的语音情感识别方法[J].清华大学学报（自然科学版）,2008,48(S1):708-714. 被引量：20
2WANG Zhiping ZHAO Li ZOU Cairong.Speech emotion recognition based on statistical pitch model[J].Chinese Journal of Acoustics,2006,25(1):87-96. 被引量：3
3王治平,赵力,邹采荣.基于基音参数规整及统计分布模型距离的语音情感识别[J].声学学报,2006,31(1):28-34. 被引量：26
4姜晓庆,田岚,崔国辉.多语种情感语音的韵律特征分析和情感识别研究[J].声学学报,2006,31(3):217-221. 被引量：8
5SHAO Yanqiu HAN Jiqing ZHAO Yongzhen LIU Ting.Study on automatic prediction of sentential stress for Chinese Putonghua Text-to-Speech system with natural style[J].Chinese Journal of Acoustics,2007,26(1):49-62. 被引量：2
6金赟,赵艳,黄程韦,赵力.耳语音情感数据库的设计与建立[J].声学技术,2010,29(1):63-68. 被引量：8
7黄程韦,赵艳,金赟,于寅骅,赵力.实用语音情感的特征分析与识别的研究[J].电子与信息学报,2011,33(1):112-116. 被引量：33
8黄永明,章国宝,李雄,达飞鹏.全局特征及弱尺度融合策略的小样本语音情感识别[J].声学学报,2012,37(3):330-338. 被引量：9
9GU Xiaojiang ZHAO Heming Lu Gang.Whispered speaker identification based on feature and model hybrid compensation[J].Chinese Journal of Acoustics,2012,31(4):499-508. 被引量：1
10黄永明,章国宝,董飞,李悦.层叠式“产生/判别”混合模型的语音情感识别[J].声学学报,2013,38(2):231-240. 被引量：3

引证文献1

1金赟,宋鹏,郑文明,赵力.半监督判别分析的跨库语音情感识别[J].声学学报,2015,40(1):20-27. 被引量：6

二级引证文献6

1张石清,刘瑞欣,赵小明.跨库语音情感识别研究进展[J].计算机系统应用,2022,31(11):31-48.
2FAN Xiaohe,ZHAO Heming,CHEN Xueqin,ZHOU Yan.Deceptive Chinese speech detection based on sparse decomposition of cepstral feature[J].Chinese Journal of Acoustics,2019,38(1):99-112.
3陶华伟,张昕然,梁瑞宇,查诚,赵力,王青云.面向语音情感识别的改进可辨别完全局部二值模式[J].声学学报,2016,41(6):905-912. 被引量：9
4樊晓鹤,赵鹤鸣,陈雪勤,周燕.倒谱参数稀疏分解下的汉语音谎言检测[J].声学学报,2018,43(1):121-128. 被引量：4
5张若凡,黄俊,古来,许二敏,古智星.基于语谱图的老年人语音情感识别方法[J].软件导刊,2018,17(9):28-31. 被引量：3
6杨子秀,金赟,马勇,戴妍妍,俞佳佳,顾煜.基于图卷积深浅特征融合的跨语料库情感识别[J].数据采集与处理,2023,38(1):111-120. 被引量：1

1LIU Qingfeng,WANG Renhua (Department of Electronic Engineering and Information Science,University of Science & Technology of China Anhui Hefei 230027).A new speech synthesis method based on the LMA vocal tract model[J].Chinese Journal of Acoustics,1998,17(2):153-162. 被引量：2
2杨栋,周秀玲,郭平.基于贝叶斯通用背景模型的图像标注[J].自动化学报,2013,39(10):1674-1680. 被引量：9
3WANG Yebin ZHAO Heming.Vocal tract resonances tracking by auxiliary vector particle filters[J].Chinese Journal of Acoustics,2011,30(1):105-114.
4炎弹平.新世代机芯对比谈[J].时尚时间,2016,0(5):60-69.
5邱作春.ICA在信号分离和消噪中的应用[J].大众科技,2009,11(12):28-29.
6节后笔记本选购推荐榜TOP20——需求至上[J].互联网周刊,2010(5):70-70.
7孙静,陶智,顾济华,赵鹤鸣.基于AD神经网络的耳语音增强的研究[J].计算机工程与应用,2007,43(29):242-244. 被引量：2
8孟君,杨大利.说话人辨认中通用背景模型训练时长研究[J].北京信息科技大学学报（自然科学版）,2013,28(3):87-91. 被引量：4
9TAO Zhi~(1,2) ZHAO Heming~2 WU Di~1 CHEN Daqing~1 ZHANG Xiaojun~1 (1 School of Physical Science and Technology,Soochow University Suzhou 215006) (2 School of Electronics and Information Engineering,Soochow University Suzhou 215006).A method of whispered speech enhancement based on speech absence probability and modified mel-domain masking model[J].Chinese Journal of Acoustics,2011,30(3):345-357.
10ZHOU Jian,ZHENG Wenming,WANG Qingyun,ZHAO Li.Intelligibility enhancement for noisy whispered speech using asymmetric cost function[J].Chinese Journal of Acoustics,2014,33(3):312-322. 被引量：2

Chinese Journal of Acoustics

2013年第4期

浏览历史

内容加载中请稍等...