基于隐藏单元条件随机场的多知识源融合改进自动语音识别置信度被引量：1

The Confidence Measure Improvement by Combining Multi-source Knowledge Based on Hidden-units Conditional Random Fields in Automatic Speech Recognition

下载PDF

导出

摘要鉴于自动语音识别(ASR)中置信度估计困难的问题,该文提出一种基于多知识源融合的策略来提高置信度的鉴别能力。具体做法是,首先选择关于识别结果的声学层、语言层和语义层等不同层次的信息,然后通过实验确定这些信息不同的组合方式,并以此为特征在隐藏单元条件随机场(Hidden-units Conditional Random Fields,HuCRFs)框架下计算识别结果的条件概率。最后将HuCRFs条件概率作为语音识别结果置信度的新的估计。实验首先证明了HuCRFs条件概率是比归一化的网格后验概率鉴别能力更强的一种置信度估计方法。同时基于HuCRFs条件概率置信度,对解码器一遍识别得到的网格重新搜索最佳候选序列,取得了相对一遍识别最佳候选序列绝对近2%的字错误率(CER)下降。同时,该文也对比了基于HuCRFs条件概率搜索的最佳候选序列和基于长语言模型网格重估的最佳候选序列的性能,进一步证明了使用HuCRFs条件概率作为置信度估计是一种更好的选择。 As to the difficulty of confidence measure estimation regarding to Automatic Speech Recognition（ASR）, a strategy resorting to multi-source knowledge combination to improve the confidence measure is proposed in this paper. More specially, the knowledge come from acoustic level, linguistic level and semantic level are firstly selected and then combined by different ways by held-out validation. And then, these multi-source knowledge are integrated under the framework of Hidden-units Conditional Random Fields（HuCRFs）. Lastly, the conditional probability computed from HuCRFs is used to be a new estimation procedure of confidence measure for recognition candidate. Experiments show that the discriminative ability of conditional probability of HuCRFs is superior to the conventional posterior computed from lattice. Furthermore, a lattice rescoring is carried out by utilizing the conditional probabilities of HuCRFs to search the best hypotheses and resulted in a significant reduction on Character Error Rate（CER） by about 2% absolutely on a benchmark corpus. Simultaneously, a performance comparison between the performances of long-distance language model based lattice rescoring and conditional probability of HuCRFs based lattice rescoring is also performed and it is further proved that HuCRFs is a better alternative to the estimation of confidence measure in ASR.

作者高兴龙潘接林颜永红

机构地区中国科学院声学研究所

出处《电子与信息学报》 EI CSCD 北大核心 2014年第8期1852-1858,共7页 Journal of Electronics & Information Technology

基金国家自然科学基金(10925419 90920302 61072124 11074275 11161140319 91120001 61271426) 中国科学院战略性先导科技专项(XDA06030100 XDA06030500) 国家863计划项目(2012AA012503) 中科院重点部署项目(KGZD-EW-103-2)资助课题

关键词语音识别置信度估计多知识源融合隐藏单元条件随机场网格重估 Speech recognition confidence measure Multi-source knowledge combination Hidden-units Conditional Random Fields（HuCRFs） Lattice rescoring

分类号 TP391.42 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献18

1Jurfek F, Thomson B, and Young S. Reinforcement learning for parameter estimation in statistical spoken dialogue systems[J]. Computer Speech & Language, 2012, 26(3): 168-192.
2Mangu L, Soltau H, Kuo H K, et al. Exploiting diversity for spoken term detection[C]. In Proceedings of Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, 2013: 8282-8286.
3Chia T K, Sim K C, Li H, et al. Statistical lattice-based spoken document retrieval[J]. A CM Transactions on Information Systems ( TOIS), 2010, 28(1): 1-30.
4Katushemererwe F and Nerbonne J. Computer-Assisted Language Learning (CALL) in support of (re)-learning native languages: the case of Runyakitara[J]. Computer Assisted Language Learning, (ahead-of-print), 2013: 1-18.
5Hahn L W. Measuring local context as context-word probabilities[J]. Behavior Research Methods, 2012, 44(2): 344-360.
6Wessel F, Macherey K, and Ney H. A comparison of word graph and n-best list based confidence measures[C]. In Proceedings of EuroSpeech, 1999:315- 318.
7Wang D, King S, Frankel J, et al. Direct posterior confidence for out-of-vocabulary spoken term detection[J]. ACM Transactions on Information Systems ( TOIS), 2012, 30(3): 598-603.
8Benitez M C, Rubio A J, Garcia P, et al. Different confidence measures for word verification in speech recognition[J]. Speech Communication, 2000, 32(2): 79- 94.
9Jiang H. Confidence measures for speech recognition: a survey[J]. Speech Communication, 2005, 45(4): 455-470.
10Park J S, Jang G J, and Kim J H. Multistage utterance verification for keyword recognition-based online spoken content retrieval[J]. IEEE Transactions on Consumer Electronics, 2012, 58(3): 1000-1005.

同被引文献4

1张文林,牛铜,屈丹,李弼程,裴喜龙.基于声学特征空间非线性流形结构的语音识别声学模型[J].自动化学报,2015,41(5):1024-1033. 被引量：9
2刘晓峰,张雪英,Zizhong John Wang.Logistic核函数及其在语音识别中的应用[J].华南理工大学学报（自然科学版）,2015,43(5):100-106. 被引量：6
3陈梦喆,张晴晴,潘接林,颜永红.语音识别中深度神经网络目标值优化[J].四川大学学报（工程科学版）,2016,48(1):166-172. 被引量：4
4杨昌达,张锦勇,顾衡,杨平,舒冬香.县级综合气象业务智能供电系统的设计及应用[J].气象科技,2016,44(6):918-922. 被引量：4

引证文献1

1孙林檀,唐博麟,田举,李子乾.基于语音识别的智能故障报修系统的研究与应用[J].电子科学技术,2017,4(5):73-76.

1林晓帆,丁晓青,吴佑寿.最近邻分类器置信度估计的理论分析[J].科学通报,1998,43(3):322-325. 被引量：10
2刘远超,吴冲,王晓龙.基于多知识源融合的关键词重要性评价研究[J].哈尔滨工业大学学报,2007,39(7):1138-1141.
3家村武,贺跃华.在不破坏模型网格的同时塑造完美的褶皱[J].艺术与设计．数码设计,2003(2):42-42.
4王建华,徐伟,路为,阎杰.多维系统仿真模型的置信度估计[J].弹箭与制导学报,2005,25(SB):626-627.
5王欢良,韩纪庆,李海峰,郑铁然.基于HMM/SVM两级结构的汉语易混淆语音识别[J].模式识别与人工智能,2006,19(5):578-584. 被引量：4
6李业刚,黄河燕,史树敏,鉴萍,苏超.基于双语协同训练的最大名词短语识别研究[J].软件学报,2015,26(7):1615-1625. 被引量：5
7冯冲,陈肇雄,黄河燕,张亮,王江伟.基于条件随机域的复杂最长名词短语识别[J].小型微型计算机系统,2006,27(6):1134-1139. 被引量：16
8优必选与亚马逊合作推出人形机器人Lynx[J].智能机器人,2017,0(1):17-17.
9宋人杰,刘娟.基于模糊聚类与RBF网络集成分类器的验证码识别[J].东北电力大学学报,2012,32(4):40-43. 被引量：2
10朱庆保,张玉兰.基于栅格法的机器人路径规划蚁群算法[J].机器人,2005,27(2):132-136. 被引量：123

电子与信息学报

2014年第8期

浏览历史

内容加载中请稍等...

基于隐藏单元条件随机场的多知识源融合改进自动语音识别置信度被引量：1

参考文献18

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于隐藏单元条件随机场的多知识源融合改进自动语音识别置信度 被引量：1

参考文献18

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于隐藏单元条件随机场的多知识源融合改进自动语音识别置信度被引量：1