借助音频数据的发音字典新词学习方法被引量：1

Learning New Words for Pronunciation Lexicon from Audio Data

下载PDF

导出

摘要针对已有的发音字典扩展方法只能从文本数据中学习新词而无法学习到音频数据中新词的问题,提出了一种基于混合语音识别系统的发音字典新词学习方法。该方法首先分别采用音节和字母音素对混合识别系统对音频数据进行集外词识别,利用系统间的互补性得到尽可能多的新词及其发音候选,然后借助感知器与最大熵模型对得到的新词及发音进行优化,降低错误率,最后实现发音字典的扩展,并利用语法语义信息完成对语言模型参数更新。基于华尔街日报(WSJ)语料库的连续语音识别实验表明:该方法可以有效学习到音频数据中的未知新词,采取的数据优化策略极大地提高了所得新词及发音的精度;在词错误率指标下,字典扩展后系统的识别性能相对基线系统提高约13.4%。 A self-learning method of new pronunciation lexicons based on a hybrid speech recognition system is proposed to solve the problem that the existing self-expanding methods of pronunciation lexicons can only learn new words from text data but cannot learn from audio data.The method utilizes both the syllables and the graphones hybrid systems to recognize the out-ofvocabulary words in the audio data and then obtains as many new words with their pronunciations as possible by using the complementary information of the two systems.Then the new word and its pronunciation candidates are optimized using aperceptron model and a maximum entropy model to reduce the error rate.Finally,the lexicon is expanded and the language model parameters are updated by using syntactic and semantic information.Experimental results of continuous speech recognition on Wall Street Journal speech database show that the proposed method learns new words from audio data effectively,and the accuracy is greatly improved by using the data optimization strategies.The extended lexicon system yields a relative gain of13.4% over the base line system in terms of word error rates.

作者范正光屈丹闫红刚张文林

机构地区解放军信息工程大学信息系统工程学院

出处《西安交通大学学报》 EI CAS CSCD 北大核心 2016年第6期75-82,共8页 Journal of Xi'an Jiaotong University

基金国家自然科学基金资助项目(61175017 61403415 61302107)

关键词语音识别发音字典新词学习集外词 speech recognition pronunciation lexicon new words learning out-of-vocabulary words

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献15

1DAVEL M, MARTIROSIAN O. Pronunciation diction-nary development in resource-scarce environments [C]∥Proceedings of International Speech Communication Association. Grenoble, France: ISCA, 2009: 2851-2854.
2BISANI M, NEY H. Joint-sequence models for grapheme-to-phoneme conversion[J].Speech Communication, 2008, 50(5): 434-451.
3RAO K, PENG F, SAK H, et al. Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks [C]∥Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ, USA: IEEE, 2015: 4225-4229.
4TIM S, OCHS S, TANJA S. Web-based tools and methods for rapid pronunciation dictionary creation[J].Speech Communication, 2014, 56(1): 101-118.
5BERT R, KRIS D, MARTENS J. An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition[J].Computer Speech and Language, 2014, 28(1): 141-162.
6郑铁然,韩纪庆,李海洋.基于词片的语言模型及在汉语语音检索中的应用[J].通信学报,2009,30(3):84-88. 被引量：5
7HE Y Z, BRIAN H, PRTER B. Subword-based modeling for handling OOV words in keyword spotting [C]∥Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ, USA: IEEE, 2014: 7914-7918.
8QIN L, RUDNICKY A I. OOV word detection using hybrid models with mixed types of fragments [C]∥Proceedings of International Speech Communication Association. Grenoble, France: ISCA, 2012: 2450-2453.
9BASHA S, AMR M, HAHN S. Improved strategies for a zero OOV rate LVCSR system [C]∥Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ, USA: IEEE, 2015: 5048-5052.
10BLACK A W, TAYLOR P, CALEY R. The festival speech synthesis system [EB/OL]. (2002-12-27)[2016-01-04]. http: ∥wwwfestvoxorg/docs/manual-143/.

二级参考文献25

1ABBERLEY D, RENALS S, COOK G. Retrieval of broadcast news documents with the THISL system[A]. Proc ICASSP98[C]. Seattle, 1998.3781-3784.
2NG K, ZUE V W. Subword-based approaches for spoken document retrieval[J]. Speech Communication, 2000, 32:157-186.
3BETH L, PEDRO M, OM D. Word and subword indexing approaches for reducing the effects of OOV queries on spoken audio[A]. Proc HLT2002[C]. San Diego, 2002.
4WECHSLER M, SCHAUBLE P. Speech retrieval based on automatic indexing[A]. Proc MIRO '95[C]. Glasgow, 1995.
5SEIDE F, et al. Vocabulary independent search in spontaneous speech[A]. Proc ICASSP'04[C]. Montreal, 2004.I253-I256.
6SARACLAR M, SPROAT R. Lattice-based search for spoken utterance retrieval[A]. Proc HLT-NAACL 2004[C]. Boston, Massachusetts, USA, 2004.129-136.
7HORI T, HETHERINGTON I L, HAZEN T J. Open-vocdalaryspoken utterance retrieval using confusion networks[A]. Proc ICASSP'07[C]. Honolulu, HI, USA, 2007.73-76.
8BAI B R, CHEN B L, WANG H M. Syllable based Chinese text/spoken document retrieval[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2000, 14(5): 603-616.
9BAI B R, WANG H M, LEE L S. Discriminating capabilities of syllable based features and approaches of utilizing them for voice retrieval of speech information in mandarin Chinese[J]. IEEE Trans Speech Audio Processing, 2002, 10(5): 303-314.
10BEAUJARD C, JARDINO M. Language modeling based on automatic word concatenations[A]. Proceedings of European Conference on Speech Communication and Technology[C]. Budapest, Hungary, 1999.

共引文献98

1张博凯,李想.基于知识图谱的Android端农技智能问答系统研究[J].农业机械学报,2021,52(S01):164-171. 被引量：12
2索红光,刘玉树,曹淑英.一种基于词汇链的关键词抽取方法[J].中文信息学报,2006,20(6):25-30. 被引量：88
3张敏,耿焕同,王煦法.一种利用BC方法的关键词自动提取算法研究[J].小型微型计算机系统,2007,28(1):189-192. 被引量：19
4马芳,王炳锡,李弼程.英语从句识别中的特征表示[J].计算机应用研究,2007,24(2):89-91. 被引量：2
5刘远超,王晓龙,徐志明,刘秉权.基于粗集理论的中文关键词短语构成规则挖掘[J].电子学报,2007,35(2):371-374. 被引量：17
6王灿辉,张敏,马少平,黄宇.基于相邻词的中文关键词自动抽取[J].广西师范大学学报（自然科学版）,2007,25(2):161-164. 被引量：10
7王素格,杨军玲,张武.基于最大熵模型与投票法的汉语动词与动词搭配识别[J].小型微型计算机系统,2007,28(7):1306-1310. 被引量：3
8刘华.基于分类标注语料库的关键词标引知识自动获取[J].图书情报工作,2007,51(7):41-43. 被引量：6
9章成志.自动标引研究的回顾与展望[J].现代图书情报技术,2007(11):33-39. 被引量：39
10章成志,周冬敏,苏新宁.自动标引通用评价模型研究[J].中国索引,2007,5(4):9-17. 被引量：1

同被引文献14

1操龙升.基于C++与Wiodows设计的计算机辅助英语教学[J].自动化与仪器仪表,2016(5):206-207. 被引量：3
2戚焱,夏珺.背诵词块对英语写作和口语水平的影响[J].解放军外国语学院学报,2016,39(1):96-103. 被引量：36
3郑通涛,曾小燕.大数据时代的汉语中介语语料库建设[J].厦门大学学报（哲学社会科学版）,2016,66(2):53-63. 被引量：16
4吴诗玉,杨枫.二语词汇语义加工中的“跨词干扰效应”研究[J].外语教学,2016,37(3):45-50. 被引量：4
5张瑞.英语语音合理性优化识别建模仿真研究[J].计算机仿真,2017,34(2):289-292. 被引量：20
6刘加,张卫强.低资源语音识别若干关键技术研究进展[J].数据采集与处理,2017,32(2):205-220. 被引量：8
7吴迪.“以学为中心”教育理念下翻转课堂在成人教育中的应用研究——以成人英语口语教学为例[J].成人教育,2017,37(5):68-70. 被引量：10
8许苏魁,戴礼荣,魏思,刘庆峰,高前勇.自由表述口语语音评测后验概率估计改进方法[J].中文信息学报,2017,31(2):212-219. 被引量：4
9秦楚雄,张连海.基于DNN的低资源语音识别特征提取技术[J].自动化学报,2017,43(7):1208-1219. 被引量：25
10高梓越,丛洪莲,蒋高明,王薇,汤梦婷,于璐璐.基于超文本标记语言5的横编计算机辅助设计系统[J].纺织学报,2017,38(10):132-137. 被引量：8

引证文献1

1呼媛玲,寇媛媛.基于音素的英文发音自动评测系统设计[J].自动化与仪器仪表,2018,0(11):160-163.

1陈戈珩,胡明辉,吴天华.基于支持向量机和HMM的音频信号分类算法[J].长春工业大学学报,2015,36(4):369-373. 被引量：5
2薛化建,董兴华,周喜,吐尔洪.吾司曼,李晓.基于子字单元的维吾尔语语音识别研究[J].计算机工程,2011,37(20):208-210. 被引量：5
3邓伟.混合语音识别系统的一种新的简化神经网络结构[J].数据采集与处理,2002,17(1):25-28.
4元太科技Elnk璀璨彩色电子纸显示器荣获2011华尔街日报年度科技创新奖[J].电子与电脑,2011(11):109-109.
5积木.iPhone屏幕变大后,应用会如何变化[J].留学生,2014,0(19):34-34.
6冯冲,陈肇雄,黄河燕,王江伟.最大熵模型的树-栅格最优N解码算法[J].计算机科学,2005,32(10):167-169. 被引量：1
7iPhone6将采用感应式充电显示器装在面板上[J].看世界,2011(15):88-88.
8电子技术应用[J].电子科技文摘,2000(3):146-147.
9联想将在新兴市场销售智能手机[J].印制电路资讯,2012(6):71-71.
10华尔街日报移动应用版即将收费[J].今日印刷,2009(10):28-28.

西安交通大学学报

2016年第6期

浏览历史

内容加载中请稍等...

借助音频数据的发音字典新词学习方法被引量：1

参考文献15

二级参考文献25

共引文献98

同被引文献14

引证文献1

相关作者

相关机构

相关主题

浏览历史

借助音频数据的发音字典新词学习方法 被引量：1

参考文献15

二级参考文献25

共引文献98

同被引文献14

引证文献1

相关作者

相关机构

相关主题

浏览历史

借助音频数据的发音字典新词学习方法被引量：1