摘要
鉴于自动语音识别(ASR)中置信度估计困难的问题,该文提出一种基于多知识源融合的策略来提高置信度的鉴别能力。具体做法是,首先选择关于识别结果的声学层、语言层和语义层等不同层次的信息,然后通过实验确定这些信息不同的组合方式,并以此为特征在隐藏单元条件随机场(Hidden-units Conditional Random Fields,HuCRFs)框架下计算识别结果的条件概率。最后将HuCRFs条件概率作为语音识别结果置信度的新的估计。实验首先证明了HuCRFs条件概率是比归一化的网格后验概率鉴别能力更强的一种置信度估计方法。同时基于HuCRFs条件概率置信度,对解码器一遍识别得到的网格重新搜索最佳候选序列,取得了相对一遍识别最佳候选序列绝对近2%的字错误率(CER)下降。同时,该文也对比了基于HuCRFs条件概率搜索的最佳候选序列和基于长语言模型网格重估的最佳候选序列的性能,进一步证明了使用HuCRFs条件概率作为置信度估计是一种更好的选择。
As to the difficulty of confidence measure estimation regarding to Automatic Speech Recognition(ASR), a strategy resorting to multi-source knowledge combination to improve the confidence measure is proposed in this paper. More specially, the knowledge come from acoustic level, linguistic level and semantic level are firstly selected and then combined by different ways by held-out validation. And then, these multi-source knowledge are integrated under the framework of Hidden-units Conditional Random Fields(HuCRFs). Lastly, the conditional probability computed from HuCRFs is used to be a new estimation procedure of confidence measure for recognition candidate. Experiments show that the discriminative ability of conditional probability of HuCRFs is superior to the conventional posterior computed from lattice. Furthermore, a lattice rescoring is carried out by utilizing the conditional probabilities of HuCRFs to search the best hypotheses and resulted in a significant reduction on Character Error Rate(CER) by about 2% absolutely on a benchmark corpus. Simultaneously, a performance comparison between the performances of long-distance language model based lattice rescoring and conditional probability of HuCRFs based lattice rescoring is also performed and it is further proved that HuCRFs is a better alternative to the estimation of confidence measure in ASR.
出处
《电子与信息学报》
EI
CSCD
北大核心
2014年第8期1852-1858,共7页
Journal of Electronics & Information Technology
基金
国家自然科学基金(10925419
90920302
61072124
11074275
11161140319
91120001
61271426)
中国科学院战略性先导科技专项(XDA06030100
XDA06030500)
国家863计划项目(2012AA012503)
中科院重点部署项目(KGZD-EW-103-2)资助课题
关键词
语音识别
置信度估计
多知识源融合
隐藏单元条件随机场
网格重估
Speech recognition
confidence measure
Multi-source knowledge combination
Hidden-units Conditional Random Fields(HuCRFs)
Lattice rescoring