期刊文献+

基于音素识别的语种辨识方法中的因子分析 被引量:1

Factor Analysis for Language Identification Based on Phoneme Recognition
原文传递
导出
摘要 在基于音素识别的语种辨识系统中,特定的一段语音,音素识别的结果会受到说话人和信道等干扰因素的影响.对此,文中基于音素搭配关系对每段语音构建相应的特征向量表示.在向量空间中,利用因子分析建立噪声子空间的数学描述模型,并在语言模型的训练和识别过程加以消除.在NISTLRE2007的测试任务中,相对于基于音素识别的语种辨识基线系统,该方法可有效提高系统性能.在30s时长测试中,基于音素识别的语言模型和基于音素识别的支持向量机模型的等错误率分别相对降低14.4%和12.9%. In the phoneme recognition based language identification system,the key issue is whether the tokens or the token sequence can reflect the language related information or not.However,it is observed that for certain utterance,the noise in the output token sequence from the phone recognizer is introduced due to the channel,speaker and background clutters.To address this problem,each utterance is represented in n-gram vector.And in this vector space,the factor analysis is applied to model the noise subspace,which will be reduced in final modeling process.The experiment results on NIST LRE 2007 show that the proposed method can outperform the existing phone recognition based language identification system.In 30s evaluation task,the equal error rate(EER) of recognition reduces relatively about 14.4% against the baseline phone recognition followed by language modeling(PRLM) system,while about 12.9% against the baseline phone recognition followed by support vector machine(PRSVM) system.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2012年第1期105-110,共6页 Pattern Recognition and Artificial Intelligence
关键词 自动语种识别 因子分析 音素识别器 Automatic Language Identification Factor Analysis Phone Recognizer
  • 相关文献

参考文献14

  • 1Matejka P,Schwarz P,Cernocky J,et al.Phonotactic Language Identification Using High Quality Phoneme Recognition//Proc of the9th European Conference on Speech Communication and Technology.Lisbon,Portugal,2005:2237-2241.
  • 2Povey D.Discriminative Training for Large Vocabulary Speech Recognition.Ph.D Dissertation.Cambridge,UK:Cambridge University,2004.
  • 3Gauvain J L,Messaoudi A,Schewenk H.Language Recognition Using Phone Lattices//Proc of the8th International Conference on Spoken Language Processing.Jeju Island,Korea,2004:12831286.
  • 4Shen Wade,Reynolds D.Improving Phonotactic Language Recognition with Acoustic Adaption//Proc of the8th Annual Conference ofthe International Speech Communication Association.Antwerp,Belgium,2007:358-361.
  • 5Gales M J F.Maximum Likelihood Linear Transformations for HMMBased Speech Recognition.Computer Speech and Language,1998,12(2):75-98.
  • 6Wegmann S,McAllester D,Orloff J,et al.Speaker Normalization on Conversational Telephone Speech//Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Atlanta,USA,1996:339-341.
  • 7Matéjka P,Schwarz P,Hermansky H,et al.Phoneme Recognition Using Temporal Patterns//Proc of the6th International Conference on Text,Speech and Dialogue.Ceske Budejovice,Czech Republic,2003:198-205.
  • 8Campbell W M,Campbell J R,Reynolds D A,et al.High-Level Speaker Verification with Support Vector Machines//Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Montreal,Canada,2004:73-76.
  • 9Zissman M A.Comparison of Four Approaches to Automatic Language Identification of Telephone Speech.IEEE Trans on Speech and Audio Processing,1996,4(1):31-44.
  • 10Campbell W M,Campbell J P.Support Vector Machines for Speaker and Language Recognition.Computer Speech and Language,2006,20(2/3):210-229.

二级参考文献11

  • 1E.Singer,P.A.Torres-Carrasquillo,T.P.Gleason,W.M.Campbell,and D.A.Reynolds.Acoustic,Phonetic,and Discriminative approaches to Automatic Language Identification[C]//Proc.Eurospeech 2003,Sept.2003:1345-1348.
  • 2P.A.Torres-Carrasquillo,E.Singer,M.A.Kohler,R.J.Greene,D.A.Reynolds,and J.R.Deller,Jr.Approaches to language identification using Gaussian mixture models and shifted delta cepstral features[C]//Proc.ICSLP,Colorado,USA:Sept.2002,89-92.
  • 3Patrick Kenny,G.Boulianne,P.Ouellet and P.Dumouchel.Speaker and Session Variability in GMM-Based Speaker Verification[J].IEEE Transactions on Audio,Speech and Language Processing,May 2007,15(4):1448-1460.
  • 4C.Vair,D.Cotibro,F.Castaldo,E.Dalmasso,and P.Laface.Channel factors compensation in model and feature domain for speaker recognition[C]//Proc.IEEE Odyssey,San Juan,PR:Jun.2006,CD-ROM.
  • 5NIST 2007 LRE Plan[EB/OL],http://www.itl.nist.gov/iad/mig//tests/lang/2007.
  • 6Gauvain,J.-L,Chin-Hui Lee.Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains[J].IEEE Transactions on Speech and Audio Processing,1994,2(2):291-298.
  • 7P.Kenny,G.Boulianne,and P.Dumouchel.Eigenvoice modeling with sparse training data[J].IEEE Transactions on Speech Audio Processing,May 2005,13(3):p345-354.
  • 8Callfriend corpus,telephone speech of 15 different languages or dialects[DB/OL],/www.ldc.upenn.edu/Catalog.
  • 9LORI F.LAMEL,LAWRENCE R.RABINER.An Improved Endpoint Detector for Isolated W0rd Recognition[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,Aug 1981,29(4):777-785.
  • 10Douglas A.Reynolds,Thomas F.Quatieri and Robert B.Dunn.Speaker verification using adapted Gaussian mixture models[J].Digital Signal Processing,Jan.2000,10:19-41.

共引文献3

同被引文献8

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部