语音检索中子词单元的构建算法

Construction Algorithm of Sub-word Unit in Speech Retrieval

下载PDF

导出

摘要针对语音关键词检索中的集外词问题,提出基于最大互信息-最小描述长度(MMI-MDL)的子词集构建算法。根据子词对的互信息挑选聚合对,通过MDL准则判断是否聚合成新的子词。使用该子词集把单词映射成子词的组合用于检索。实验结果表明,与已有的MDL子词集构建算法相比,由MMI-MDL方法得到的子词集对检索性能有较大提高,在相同精确率指标下,集外词的召回率相对MDL算法提高12.1%。 In order to solve the Out-of-Vocabulary（OOV） problem in speech retrieval tasks,this paper presents a construction algorithm of sub-word units based on Maximum Mutual Information and Minimum Description Length（MMI-MDL）.It selects candidate pairs according to the mutual information of sub-word pairs,judges whether combining the pairs to a new sub-word through MDL.After getting the sub-word set,map the word into sub-word for retrieval.Experimental results show that compared with the MDL algorithm,the proposed method has a better performance,and achieves a 12.1% relative improvement on the OOV recall rate.

作者杨乐吴及吕萍

机构地区清华大学电子工程系

出处《计算机工程》 CAS CSCD 2012年第24期251-253,257,共4页 Computer Engineering

基金国家自然科学基金资助项目(61170197) 清华大学自主科研计划基金资助项目(2011thz0)

关键词集外词语音检索子词最小描述长度最大互信息词格网络 Out-of-Vocabulary（OOV） speech retrieval sub-word Minimum Description Length（MDL） Max Mutual Information（MMI） word lattice network

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献11

1NIST. The Spoken Term Detection(STD) 2006 Evaluation Plan[EB/OL]. (2006-09-16). http://www.itl.nist.gov/iad/mig/tests/ std/2006/docs/std06-evalplan-v 10.pdf.
2Wang Dong, King S. Stochastic Pronunciation Modeling and Soft Match for Out-of-Vocabulary Spoken Term Detection[C]//Proc. of IEEE International Conference on Acoustics Speech and Signal Processing. [S. 1.]: IEEE Press, 2010.
3包叶波,胡郁,刘聪,江辉,戴礼荣,刘庆峰.中文连续语音识别系统音素建模单元集的构建[J].清华大学学报（自然科学版）,2011,51(9):1288-1292. 被引量：2
4Gouvea E. Subword Unit Approaches for Retrieval by Voice[C]// Proc. of IEEE International Conference on Acoustics Speech andSignal Processing. [S. 1.]: IEEE Press, 2010.
5Hewlett D. Fully Unsupervised Word Segmentation with BVE and MDL[C]//Proc. of the 49th Annual Meeting of the Association for Computational Linguistics. Portland, USA: [s. n.], 2011.
6Bisani M, Ney H. Open Vocabulary Speech Recognition with Flat Hybrid Models[C]//Proc. of the European Conference on Speech Communication and Technology. Lisboa, Portugal: [s. n.], 2005.
7Akbacak M, Vergyri D, Stolcke A. Open-vocabulary Spoken Term Detection Using Graphone-based Hybrid Recognition Systems[C]// Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA: [s. n.], 2008.
8Reddy S, Goldsmith J. An MDL-based Approach to Extracting Subword Units for Grapheme-to-Phoneme Conversion[C]//Proe. of Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, USA: ACM Press, 2010.
9NIST. Nistg2p[EB/OL]. (2003-07-01). ftp://jaguar.ncsl.nist.gov// pub/addttp4-1.1 .tar.
10Rissanen J. A Universal Prior for Integers and Estimation by Minimum Description Length[J]. The Annals of Statistics, 1983, 11(2): 416-431.

二级参考文献14

1马彦,石要武,陈虹.色噪声下的信号频率估计:基于互高阶累积量的MUSIC方法[J].仪器仪表学报,2001,22(z2):88-89. 被引量：3
2黄磊,吴顺君,张林让,冯大政.快速子空间分解方法及其维数的快速估计[J].电子学报,2005,33(6):977-981. 被引量：44
3石要武,戴逸松,丁宏.有色噪声背景下正弦信号频率估计的互谱Pisarenko和MUSIC方法[J].电子学报,1996,24(10):46-50. 被引量：42
4李净,徐明星.汉语连续语音识别中声学模型基元比较:音节、音素、声韵母[C].第六届全国人机语音通信会议,20014:267-280.
5HUANG Chao, SHI Yu, tonal modeling for phone [C]// Proc ICASSP'04. Processing Society, 2004, ZHOU Jianlai, et al. Segmental set design in Mandarin LVCSR Montreal, Canada: IEEE Signal 1: 901 - 904.
6CHANG Eric, ZHOU Jianlai, DI Shuo, et al. Large vocabulary Mandarin speech recognition with different approaches in modeling tones [C]// Proc ICSLP. Beijing, China: China Military Friendship Press, 2000, 2:983 - 986.
7MA Bin, HUO Qiang. Benchmark results of triphone-based acoustic modeling on HKU96 and HKU99 Putonghua corpora [C]//Proc ISCSLP. Beijing, China, 2000: 359- 362.
8XIANG Bing, Long Nguyen, GUO Xuefeng, et al. The BBN Mandarin broadcast news transcription system [C]// Proc Interspeech'05. Lisbon, Portugal, 2005: 1649-1652.
9Hwang M Y, PENG Gang, highly accurate Mandarin WANG Wen, et al. Building a speech recognizer [C]// Proc ASRU'07. Kyoto, Japan: IEEE Signal Processing Society 2007, 490 - 495.
10Hwang M Y, PENG highly accurate Gang, Ostendorf M, et al. Building a Mandarin speech recognizer with language independent technologies and language-dependent modules [J]. IEEE Trans on Audio, Speech and Language Processing, 2009, 17(7) : 1253 - 1262.

共引文献5

1张玉波,黎雄,韩东波.基于多重信号分类法的线性调频差频频率估计[J].航空兵器,2011,18(2):34-36.
2司伟建,朱曈,张梦莹.色噪声背景下基于特征空间的信源估计新方法[J].系统工程与电子技术,2011,33(8):1713-1717. 被引量：2
3赖英旭,刘宏楠,杨震,刘静.基于LZW算法的未知恶意代码检测方法[J].北京工业大学学报,2012,38(7):1087-1092.
4缪惠峰,崔炜程,张仕元.基于RELAX算法的弹道目标一维距离像尺寸提取[J].空军预警学院学报,2015,29(4):239-241. 被引量：1
5杨金锋,李凯涛,贾桂敏,师一华.基于DNN-HMM的陆空通话声学模型构建方法[J].中国民航大学学报,2019,37(4):36-40. 被引量：2

1张力文,努尔麦麦提.尤鲁瓦斯,吾守尔.斯拉木.维吾尔语语音检索技术研究[J].中文信息学报,2014,28(5):182-186. 被引量：3
2许俊刚,柯有安.估计线性系统极点数目的MDL方法[J].系统工程与电子技术,1993,15(11):46-50.
3袁旭海,王让定.结合隐藏技术的新颖语音检索算法[J].宁波大学学报（理工版）,2006,19(4):435-440.
4沈玺,王永成.WEB语音检索中查询概念纠错的研究[J].计算机仿真,2006,23(2):222-226. 被引量：2
5孟莎,余鹏,刘加.基于格的汉语自然对话语音索引方法研究[J].自动化学报,2010,36(2):215-220. 被引量：1
6吕丹桔,徐伟恒.宾馆信息语音检索系统的研究[J].电脑知识与技术,2010,6(8):6295-6297.
7孟莎,余鹏,Frank Seide,刘加.基于后验概率词格的汉语自然对话语音索引[J].清华大学学报（自然科学版）,2008,48(S1):673-677. 被引量：2
8郑永军,张连海.基于动态匹配词格检索的关键词检测[J].应用科学学报,2014,32(2):149-155. 被引量：2
9章森,华绍和.普通话广播语音的多层次标注与检索[J].中文信息学报,2007,21(4):97-104. 被引量：3
10孔祥勇,宋健.语音检索在中医处方信息系统中的应用[J].计算机与现代化,2009(10):175-178. 被引量：1

计算机工程

2012年第24期

浏览历史

内容加载中请稍等...

语音检索中子词单元的构建算法

参考文献11

二级参考文献14

共引文献5

相关作者

相关机构

相关主题

浏览历史