期刊文献+

低数据资源条件下基于优化的数据选择策略的无监督语音识别声学建模 被引量:2

Optimized data selection strategy based unsupervised acoustic modeling for low data resource speech recognition
原文传递
导出
摘要 为了克服低数据资源条件下的资源匮乏问题,该文利用无监督的声学模型训练方法来增加训练数据,改善系统性能。在标准的无监督训练框架下,在传统词图后验概率的词置信度基础上,提出了基于句子后验概率的置信度数据筛选准则,所选数据在保证整句话可靠性的同时很好保留了上下文信息,有利于跨词的三音子声学模型建模;还提出了基于音素覆盖率准则的数据筛选方法,在考虑假设标注句子置信可靠度的同时,尽可能选取训练样本中最为稀疏的音素单元,从源头再次克服低数据资源的困难,数据选择效率更高,性能进一步提升。实验表明:基于本文改进的无监督训练方法的词错误率比基线有监督训练方法的降低约相对8%,比传统无监督方法的也有绝对2%的减少,极大程度改善了低数据资源条件下的系统性能。 Unsupervised acoustic model training is used here to enlarge the training data and improve low-resource speech recognition. The traditional word-level posterior confidence was modified to create an utterance-level posterior confidence to select more useful hypotheses in the unsupervised training. The utterance level confidence strategy retains the sentence context information while ensuring the reliability of the hypotheses and is better for cross-word acoustic modeling than word level data selection. This paper also presents a phone frequency based data selection method. This approach selects low frequency data in the original data set in priority to relieve the low-resource problem. Tests show that this unsupervised approach has an 80//oo better relative WER than supervised baseline systems and about 2% absolute improvement over the normal unsupervised training method. Thus, the approach significantly improves speech recognition performance in low-resource scenarios
作者 钱彦旻 刘加
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2013年第7期1001-1004,1010,共5页 Journal of Tsinghua University(Science and Technology)
基金 国家自然科学基金资助项目(60931160443,61273268,90920302) 国家科技支撑计划项目(2009BAH41B01)
关键词 语音识别 低数据资源 无监督训练 数据选择 speech recognition low data resource unsupervised training data selection
  • 相关文献

参考文献12

  • 1Rabiner L. A tutorial on hidden markov models and selected applications in speech recognition [J]. Proceedings of The IEEE, 1989, 77(2): 257-286.
  • 2Fung P, Schultz T. Multilingual spoken language processing [J]. IEEE Signal Processing Magazine, 2008, 25(3): 89 - 97.
  • 3QIAN Yanmin XU Ji LIU Jia.Multi-Stream Posterior Features and Combining Subspace Gmms for Low Resource LVCSR[J].Chinese Journal of Electronics,2013,22(2):291-295. 被引量:2
  • 4Qian Y, Xu J, Povey D, et al. Strategies for using MLP based features with limited target-language training [C]// Proceedings on ASRU. Hawaii: IEEE Press, 2011: 354 -358.
  • 5Qian Y, Povey D, Liu J. State-level data borrowing for low-resource speech recognition based on suhspace GMMs [C]// Proceedings on INTERSPEECH. Florence, Italy: ISCA, 2011: 553-556.
  • 6Lamel L, Gauvain J L, Adda G. Lightly supervised and unsupervised acoustic model training [J]. Computer Speech and Language, 2002, 16(1) : 115 - 129.
  • 7Silva T F, Gauvain J L, Lamel L. Lattice-based unsupervised acoustic model training [C]// Proceedings on ICASSP. Prague, Czech~ IEEE Press, 2011: 4656 - 4659.
  • 8Ma J, Matsoukas S, Kimball O, et al. Unsupervised training on large amounts of broadcast news data [C]// Proceedings on ICASSP. Toulouse, France: IEEE Press, 2006: 1056 - 1059.
  • 9Siu M, Gish H. Evaluation of word confidence for speech recognition systems [J]. Computer Speech and Language, 1999, 13(4): 299-318.
  • 10Wessel F, Ney H. Unsupervised training of acoustic models for large vocabulary continuous speech recognition [J]. IEEE Transaction on Speech and Audio Processing, 2005, 13(1) 23 -31.

二级参考文献15

  • 1P. Fung and T. Schultz, “Multilingual spoken language process-ing,' IEEE Signal Processing Magazine, Vol.25, No.3, pp.89—97,2008.
  • 2X. Cui, J. Xue et al, “Acoustic modeling with bootstrap and re-structuring for low-resourced langauges”,Proc. of Interspeech,Makuhari, Japan, pp.2974-2977, 2010.
  • 3H. Lin, L. Deng et al., UA study on multilingual acoustic mod-eling for large vocabulary ASR”, Proc. of ICASSP, Taipei, Tai-wan, China, pp.4333—4336, 2009.
  • 4B.D. Walker, B.C. Lackey, J.S. Muller and P.J. Schone,aLanguage-reconfigurable universal phone recognition”, Proc.of Eurospeech, Geneva, Switzerland, 2003.
  • 5S.M. Siniscalchi, T, Svendsen and C.H. Lee, “Toward bottom-upcontinuous phone recognition”,Proc. of ASRU, Kyoto, Japan,pp.566—569, 2007.
  • 6S.M. Siniscalchi, T. Svendsen and C.H. Lee, “Toward a detector-based universal phone recognizer”,Proc. of ICASSP, Las Ve-gas, Nevada, USA, pp.4261—4264, 2008.
  • 7D. Povey, L. Burget et al., “The subspace gaussian mixturemodel-a structured model for speech recognition”,ComputerSpeech and Language、Vol.25, No.2, pp.404—439, 2011.
  • 8Y. Qian, D. Povey, J. Liu, “State-level data borrowing for low-resource speech recognition based on subspace gmms,,,Proc. ofInterspeech, Florence, Italy, pp.553—556, 2011.
  • 9A. Stolcke, “SRILM - an extensible language modeling toolkit”,Proc, of ICSLP, Denver, Colorado, USA, pp.901-904, 2002.
  • 10ICSI QuickNet Software Package, http://www. icsi. Berkeley, deu/speech/qn.htm.

共引文献1

同被引文献21

  • 1钱彦旻. 低数据资源条件下的语音识别技术新方法研究[D]. [博士论文], 清华大学, 2012: 67-85.
  • 2Gunter S and Bunke H. Optimizing the number of states, training iterations and Gaussians in an HMM-based handwritten word recognizer[C]. 7th International Conference on Document Analysis and Recognition (ICDAR), Edinburgh, Scotland, UK, 2003: 472-476.
  • 3Geiger J, Schenk J, Wallhoff F, et al.. Optimizing the number of states for HMM-based on-line handwritten whiteboard recognition[C]. 12th International Conference on Frontiers in Handwriting Recognition (ICFHR), Kolkata, India, 2010: 107-112.
  • 4Qing H, Chan C, and Chin-Hui L. Bayesian learning of the SCHMM parameters for speech recognition[C]. IEEE 19th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Adelaide, USA, 1994, I: 221-224.
  • 5Leggetter C J and Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models[J]. Computer Speech & Language, 1995, 9(2): 171-185.
  • 6Ait-Mohand K, Paquet T, and Ragot N. Combining structure and parameter adaptation of HMMs for printed text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(9): 1716-1732.
  • 7Ait-Mohand K, Paquet T, Ragot N, et al.. Structure adaptation of HMM applied to OCR[C]. 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 2010: 2877-2880.
  • 8Jiang Zhi-wei, Ding Xiao-qing, Peng Liang-rui, et al.. Analyzing the information entropy of states to optimize the number of states in an HMM-based off-line handwritten Arabic word recognizer[C]. 21st International Conference on Pattern Recognition, Tsukuba, Japan, 2012: 697-700.
  • 9Bicego M, Murino V, and Figueiredo M A T. A sequential pruning strategy for the selection of the number of states in hidden Markov models[J]. Pattern Recognition Letters, 2003, 24(9): 1395-1407.
  • 10Seymore K, McCallum A, and Rosenfeld R. Learning hidden Markov model structure for information extraction[C]. AAAI-99 Workshop on Machine Learning for Information Extraction, Orlando, USA, 1999: 37-42.

引证文献2

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部