期刊文献+

语音检索中子词单元的构建算法

Construction Algorithm of Sub-word Unit in Speech Retrieval
下载PDF
导出
摘要 针对语音关键词检索中的集外词问题,提出基于最大互信息-最小描述长度(MMI-MDL)的子词集构建算法。根据子词对的互信息挑选聚合对,通过MDL准则判断是否聚合成新的子词。使用该子词集把单词映射成子词的组合用于检索。实验结果表明,与已有的MDL子词集构建算法相比,由MMI-MDL方法得到的子词集对检索性能有较大提高,在相同精确率指标下,集外词的召回率相对MDL算法提高12.1%。 In order to solve the Out-of-Vocabulary(OOV) problem in speech retrieval tasks,this paper presents a construction algorithm of sub-word units based on Maximum Mutual Information and Minimum Description Length(MMI-MDL).It selects candidate pairs according to the mutual information of sub-word pairs,judges whether combining the pairs to a new sub-word through MDL.After getting the sub-word set,map the word into sub-word for retrieval.Experimental results show that compared with the MDL algorithm,the proposed method has a better performance,and achieves a 12.1% relative improvement on the OOV recall rate.
出处 《计算机工程》 CAS CSCD 2012年第24期251-253,257,共4页 Computer Engineering
基金 国家自然科学基金资助项目(61170197) 清华大学自主科研计划基金资助项目(2011thz0)
关键词 集外词 语音检索 子词 最小描述长度 最大互信息 词格网络 Out-of-Vocabulary(OOV) speech retrieval sub-word Minimum Description Length(MDL) Max Mutual Information(MMI) word lattice network
  • 相关文献

参考文献11

  • 1NIST. The Spoken Term Detection(STD) 2006 Evaluation Plan[EB/OL]. (2006-09-16). http://www.itl.nist.gov/iad/mig/tests/ std/2006/docs/std06-evalplan-v 10.pdf.
  • 2Wang Dong, King S. Stochastic Pronunciation Modeling and Soft Match for Out-of-Vocabulary Spoken Term Detection[C]//Proc. of IEEE International Conference on Acoustics Speech and Signal Processing. [S. 1.]: IEEE Press, 2010.
  • 3包叶波,胡郁,刘聪,江辉,戴礼荣,刘庆峰.中文连续语音识别系统音素建模单元集的构建[J].清华大学学报(自然科学版),2011,51(9):1288-1292. 被引量:2
  • 4Gouvea E. Subword Unit Approaches for Retrieval by Voice[C]// Proc. of IEEE International Conference on Acoustics Speech andSignal Processing. [S. 1.]: IEEE Press, 2010.
  • 5Hewlett D. Fully Unsupervised Word Segmentation with BVE and MDL[C]//Proc. of the 49th Annual Meeting of the Association for Computational Linguistics. Portland, USA: [s. n.], 2011.
  • 6Bisani M, Ney H. Open Vocabulary Speech Recognition with Flat Hybrid Models[C]//Proc. of the European Conference on Speech Communication and Technology. Lisboa, Portugal: [s. n.], 2005.
  • 7Akbacak M, Vergyri D, Stolcke A. Open-vocabulary Spoken Term Detection Using Graphone-based Hybrid Recognition Systems[C]// Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA: [s. n.], 2008.
  • 8Reddy S, Goldsmith J. An MDL-based Approach to Extracting Subword Units for Grapheme-to-Phoneme Conversion[C]//Proe. of Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, USA: ACM Press, 2010.
  • 9NIST. Nistg2p[EB/OL]. (2003-07-01). ftp://jaguar.ncsl.nist.gov// pub/addttp4-1.1 .tar.
  • 10Rissanen J. A Universal Prior for Integers and Estimation by Minimum Description Length[J]. The Annals of Statistics, 1983, 11(2): 416-431.

二级参考文献14

  • 1马彦,石要武,陈虹.色噪声下的信号频率估计:基于互高阶累积量的MUSIC方法[J].仪器仪表学报,2001,22(z2):88-89. 被引量:3
  • 2黄磊,吴顺君,张林让,冯大政.快速子空间分解方法及其维数的快速估计[J].电子学报,2005,33(6):977-981. 被引量:44
  • 3石要武,戴逸松,丁宏.有色噪声背景下正弦信号频率估计的互谱Pisarenko和MUSIC方法[J].电子学报,1996,24(10):46-50. 被引量:42
  • 4李净,徐明星.汉语连续语音识别中声学模型基元比较:音节、音素、声韵母[C].第六届全国人机语音通信会议,20014:267-280.
  • 5HUANG Chao, SHI Yu, tonal modeling for phone [C]// Proc ICASSP'04. Processing Society, 2004, ZHOU Jianlai, et al. Segmental set design in Mandarin LVCSR Montreal, Canada: IEEE Signal 1: 901 - 904.
  • 6CHANG Eric, ZHOU Jianlai, DI Shuo, et al. Large vocabulary Mandarin speech recognition with different approaches in modeling tones [C]// Proc ICSLP. Beijing, China: China Military Friendship Press, 2000, 2:983 - 986.
  • 7MA Bin, HUO Qiang. Benchmark results of triphone-based acoustic modeling on HKU96 and HKU99 Putonghua corpora [C]//Proc ISCSLP. Beijing, China, 2000: 359- 362.
  • 8XIANG Bing, Long Nguyen, GUO Xuefeng, et al. The BBN Mandarin broadcast news transcription system [C]// Proc Interspeech'05. Lisbon, Portugal, 2005: 1649-1652.
  • 9Hwang M Y, PENG Gang, highly accurate Mandarin WANG Wen, et al. Building a speech recognizer [C]// Proc ASRU'07. Kyoto, Japan: IEEE Signal Processing Society 2007, 490 - 495.
  • 10Hwang M Y, PENG highly accurate Gang, Ostendorf M, et al. Building a Mandarin speech recognizer with language independent technologies and language-dependent modules [J]. IEEE Trans on Audio, Speech and Language Processing, 2009, 17(7) : 1253 - 1262.

共引文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部