期刊文献+

Speaker adapted dynamic lexicons containing phonetic deviations of words

Speaker adapted dynamic lexicons containing phonetic deviations of words
原文传递
导出
摘要 Speaker variability is an important source of speech variations which makes continuous speech recognition a difficult task.Adapting automatic speech recognition(ASR) models to the speaker variations is a well-known strategy to cope with the challenge.Almost all such techniques focus on developing adaptation solutions within the acoustic models of the ASR systems.Although variations of the acoustic features constitute an important portion of the inter-speaker variations,they do not cover variations at the phonetic level.Phonetic variations are known to form an important part of variations which are influenced by both micro-segmental and suprasegmental factors.Inter-speaker phonetic variations are influenced by the structure and anatomy of a speaker's articulatory system and also his/her speaking style which is driven by many speaker background characteristics such as accent,gender,age,socioeconomic and educational class.The effect of inter-speaker variations in the feature space may cause explicit phone recognition errors.These errors can be compensated later by having appropriate pronunciation variants for the lexicon entries which consider likely phone misclassifications besides pronunciation.In this paper,we introduce speaker adaptive dynamic pronunciation models,which generate different lexicons for various speaker clusters and different ranges of speech rate.The models are hybrids of speaker adapted contextual rules and dynamic generalized decision trees,which take into account word phonological structures,rate of speech,unigram probabilities and stress to generate pronunciation variants of words.Employing the set of speaker adapted dynamic lexicons in a Farsi(Persian) continuous speech recognition task results in word error rate reductions of as much as 10.1% in a speaker-dependent scenario and 7.4% in a speaker-independent scenario. Speaker variability is an important source of speech variations which makes continuous speech recognition a difficult task. Adapting automatic speech recognition (ASR) models to the speaker variations is a well-known strategy to cope with the challenge, Almost all such techniques focus on developing adaptation solutions within the acoustic models of the ASR systems. Although variations of the acoustic features constitute an important portion of the inter-speaker variations, they do not cover variations at the phonetic level. Phonetic variations are known to form an important part of variations which are influenced by both micro-segmental and suprasegmental factors. Inter-speaker phonetic variations are influenced by the structure and anatomy of a speaker's articulatory system and also his/her speaking style which is driven by many speaker background characteristics such as accent, gender, age, socioeconomic and educational class. The effect of inter-speaker variations in the feature space may cause explicit phone recognition errors. These errors can be compensated later by having appropriate pronunciation variants for the lexicon entries which consider likely phone misclassifications besides pronunciation. In this paper, we introduce speaker adaptive dynamic pronunciation models, which generate different lexicons for various speaker clusters and different ranges of speech rate. The models are hybrids of speaker adapted contextual rules and dynamic generalized decision trees, which take into account word phonological structures, rate of speech, unigram probabilities and stress to generate pronunciation variants of words. Employing the set of speaker adapted dynamic lexicons in a Farsi (Persian) continuous speech recognition task results in word error rate reductions of as much as 10.1% in a speaker-dependent scenario and 7.4% in a speaker-independent scenario.
出处 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2009年第10期1461-1475,共15页 浙江大学学报(英文版)A辑(应用物理与工程)
关键词 Pronunciation models Continuous speech recognition Lexicon adaptation 词典 偏差 语音 文字
  • 相关文献

参考文献10

  • 1Almasganj,F,Seyedsalehi,S.A.,Bijankhan,M.,Sameti,H.,Sheikhzadegan,J.SHENAVA-1: Persian Sponta-neous Continuous Speech Recognizer[].ProcIntConfon Electrical Engineering.2001
  • 2Bijankhan,M,Sheikhzadegan,M.J.,Roohani,M.R.,Zarrin-tare,R.,Ghasemi,S.Z.,Ghasedi,M.E.Tfarsdat— The Telephone Farsi Speech Database[].European Confon Speech Communication and Technology.2003
  • 3Chen,K.,Hasegawa-Johnson,M.Modeling Pronun-ciation Variation Using Artificial Neural Networks for English Spontaneous Speech[].IntConfon Spoken Language Processing.2004
  • 4Davis,S.,Mermelstein,P.Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[].IEEE TransAcoustSpeech Signal Process.1980
  • 5Haghshenas,A.M.A Course in Phonetics[]..1996
  • 6Hazen,T.,Hetherington,L.,Shu,L.,Livescu,K.Pro-nunciation modeling using a finite-state transducer rep-resentation[].Speech Communication.2005
  • 7Humphries,J.Accent Modeling and Adaptation inAutomatic Speech Recognition[]..1997
  • 8Imai,T.,Ando,A.,Miyasaka,E.A New Method for Automatic Generation of Speaker-dependent Phonologi-cal Rules[].IntConfon AcousticsSpeechand Signal Processing.1995
  • 9Jande,P.A.Spoken language annotation and data-driven modeling of phone-level pronunciation in discourse context[].Speech Communication.2008
  • 10Randolph,M.A Data-driven Method for Discovering and Predicting Allophonic Variation[].IntConfon AcousticsSpeechand Signal Processing.1990

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部