期刊文献+

基于MAP+CMLLR的说话人识别中发声力度问题 被引量:1

Vocal effort in speaker recognition based on MAP+CMLLR
下载PDF
导出
摘要 为了改善发声力度对说话人识别系统性能的影响,在训练语音存在少量耳语、高喊语音数据的前提下,提出了使用最大后验概率(MAP)和约束最大似然线性回归(CMLLR)相结合的方法来更新说话人模型、投影转换说话人特征。其中,MAP自适应方法用于对正常语音训练的说话人模型进行更新,而CMLLR特征空间投影方法则用来投影转换耳语、高喊测试语音的特征,从而改善训练语音与测试语音的失配问题。实验结果显示,采用MAP+CMLLR方法时,说话人识别系统等错误率(EER)明显降低,与基线系统、最大后验概率(MAP)自适应方法、最大似然线性回归(MLLR)模型投影方法和约束最大似然线性回归(CMLLR)特征空间投影方法相比,MAP+CMLLR方法的平均等错率分别降低了75.3%、3.5%、72%和70.9%。实验结果表明,所提出方法削弱了发声力度对说话人区分性的影响,使说话人识别系统对于发声力度变化更加鲁棒。 To improve the performance of recognition system which is influenced by the change of vocal effort, in the premise of a small amount of whisper and shouted speech data in training speech data, Maximum A Posteriori (MAP) and Constraint Maximum Likelihood Linear Regression (CMLLR) were combined to update the speaker model and transform the speaker characteristics. MAP adaption method was used to update the speaker model of normal speech training, and the CMLLR feature space projection method was used to project and transform the features of whisper and shouted testing speech to improve the mismatch between training speech and testing speech. Experimental results show that the Equal Error Rate (EER) of speaker recognition system was significantly reduced by using the proposed method. Compared with the baseline system, MAP adaptation method, Maximum Likelihood Linear Regression (MLLR) model projection method and CMLLR feature space projection method, the average EER is reduced by 75.3%, 3.5%, 72%, 70.9%, respectively. The experimental results prove that the proposed method weakens the influence on discriminative power for vocal effort and makes the speaker recognition system more robust to vocal effort variability.
出处 《计算机应用》 CSCD 北大核心 2017年第3期906-910,共5页 journal of Computer Applications
基金 贵州省社会攻关计划项目(黔科合SY字[2013]3105号) 贵州省工程技术研究中心建设项目(黔科合G字[2014]4002号)~~
关键词 说话人识别 发声力度 最大后验概率 最大似然线性回归 约束最大似然线性回归 speaker recognition vocal effort Maximum A Posteriori (MAP) Maximum Likelihood Linear Regression(MLLR) Constraint Maximum Likelihood Linear Regression (CMLLR)
  • 相关文献

参考文献1

二级参考文献15

  • 1栗学丽,丁慧,徐柏龄.基于熵函数的耳语音声韵分割法[J].声学学报,2005,30(1):69-75. 被引量:34
  • 2杨莉莉,林玮,徐柏龄.汉语耳语音孤立字识别研究[J].应用声学,2006,25(3):187-192. 被引量:8
  • 3TRAUNM13"LLER H, ERIKSSON A. Acoustic effects of variation in vocal effort by men, women, and children [ J]. Journal of the A- coustical Society of America, 2000, 107(6): 3438-3451.
  • 4ZELINKA P, SIGMUND M, SCHIMMEL J. Impact of vocal effort variability on automatic speech recognition [ J]. Speech Communica- tion, 2012, 54(6): 732-742.
  • 5RAITIO T, SUNI A, POHJALAINEN J, et al. Analysis and synthe- sis of shouted speech [ C]// INTERSPEECH 2013: Proceedings of the 14th Annual Conference of the International Speech Communica- tion Association. [ S. l. ] : ISCA, 2013:1544 - 1548.
  • 6ZHANG C, HANSEN J H L. Analysis and classification of speech mode: whispered through shouted [ C]// INTERSPEECH 2007: Proceedings of the 8 th Annual Conference of the International Speech Communication Association. [ S. 1. ] : ISCA, 2007:2289 -2292.
  • 7BOU-GHAZALE S, HANSEN J H L. HMM-based stressed speech modeling with application to improved synthesis and recognition of isolated speech under stress [ J]. IEEE Transactions on Speech Audio Processing, 1998, 6(3): 201-216.
  • 8LU Y, COOKE M. The contribution of changes in F 0 and spectral tilt to increased intelligibility of speech produced in noise [ J]. Speech Communication, 2009, 51(12) : 1253 - 1262.
  • 9JOVICIC S T, SARIC Z. Acoustic analysis of consonants in whis- pered speech [J]. Journal of Voice, 2008, 22(3): 263 -274.
  • 10ZHANG C, HANSEN J H L. An entropy based feature for whisper- island detection within audio streams [ C]//INTERSPEECH 2008: Proceedings of the 9th Annual Conference of the International Speech Communication Association. [ S. 1. ] : ISCA, 2008:2510 -2513.

共引文献7

同被引文献4

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部