期刊文献+

语音反演远端监督学习模型研究 被引量:1

Research on Distal Supervised Learning Model of Speech Inversion
下载PDF
导出
摘要 针对发音信息在话音环境中并不容易得到的问题,提出了一种从听觉信号中预测发音信息的语音反演方法。论文应用远端监督学习(DSL),对语音反演机器学习策略进行研究,并对其实验背景和理论依据进行了分析。论文在提出一种对远端监督学习逆模进行全局优化的方法的同时,通过应用八个声道变量作为发音信息来模拟语音动力学,对语音信号分别被参数化为声学参数(APs)和梅尔频率倒谱系数(MFCCs)时的预测结果进行了比较。结果表明远端监督学习对声道变量有较好的预测性能。 To the problem that articulatory information is not readily available in typical speakerlistener situations, a method that esti mates articulatory information from the acoustic signal is proposed, namely speech inversion. It selectes distal supervised learning (DSL) as one of machine learning strategies for speech inversion to study, and analyzes the experiment's background and theoretical foundation of distal supervised learning. It proposes that use a global optimization approach for the inverse model of distal supervised teaming and eight tract variables as articulatory information to simulate speech dynamics, the results when speech signal is parameterized as acoustic parameters (APs) and as melfrequency cepstral coefficients (MFCCs) are compared in the paper. The results show that distal super vised learning has a good estimation performance for tract variables.
作者 陈英 张少白
出处 《计算机技术与发展》 2013年第3期105-108,共4页 Computer Technology and Development
基金 国家自然科学基金资助项目(61073115)
关键词 发音信息 语音反演 远端监督学习 声道变量 articulatory information speech inversion distal supervised learning (DSL) tract variables
  • 相关文献

参考文献13

  • 1Neiberg D, Ananthakrishnan G ,Engwall O. The acoustic to ar- ticulation mapping:non-linear or non-unique[ C]//Proc. In- terspeech,gth Annual Conference of the International Speech Communication Association. Australia: [ s. n. ], 2008:1485 - 1488.
  • 2Zhuang X, bIam H, Hasegawa-Johnson M, et al. The entropy of articulatory phonological code:recognizing gestures from tract variables[ C ]//Proc. Interspeech,9th Annual Conference of the International Speech Communication Association. Austral- ia: [ s. n. ] ,2008 : 1489-1492.
  • 3Zhuang X, Nam H, Hasegawa-Johnson M, et al. Articulatory phonological code for word classification [ C ]//Prec.
  • 4Katsamanis A,Papandreou G, Maragos P. Face active appear- ance modeling and speech acoustic information to recover ar- ticulation [ J ]. IEEE Trans. on Audio, Speech, Lang. Process. ,2009,17 ( 3 ) :411-422.
  • 5Mitra V, 0zbek I, Nam H, et al. From acoustics to vocal tract time functions [ C ]//Proc. of ICASSP. [ s. 1. ]: [ s. n. ], 2009:4497-4500.
  • 6Byrd D, Saltzman E. The elastic phrase:modeling the dynam- ics of boundary-adjacent lengthening[J]. J. Phonetics,2003, 31(2) :149-180.
  • 7Mitra V, Nam H, Espy-Wilson C, et al. Noise robustness of tract variables and their application to speech recognition [ C ]//Proc. Interspeech, 10th Annual Conference of the In- ternational Speech Communication Association. U. K. : [ s. n. ] ,2009:2759-2762.
  • 8Inter- speech,lDth Annual Conference of the International Speech Communication Association. U. K. : [ s. n. ], 2009:2763 - 2766.
  • 9Nam H, Goldstein L, Saltzman E, et al. Tada: an enhanced, portable task dynamics model in matlab [ J ]. J. Acoust. Soc. Amer. , 2004,115 ( 5-2 ) : 2430 - 2430.
  • 10Juneja A. Speech Recognition Based on Phonetic Features and Acoustic Landmarks [ D ]. USA : Univ. of MD, College Park, 2004.

同被引文献7

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部