In speech recognition, acoustic modeling always requires tremendous transcribed samples, and the transcription becomes intensively time-consuming and costly. In order to aid this labor-intensive process, Active Learni...In speech recognition, acoustic modeling always requires tremendous transcribed samples, and the transcription becomes intensively time-consuming and costly. In order to aid this labor-intensive process, Active Learning (AL) is adopted for speech recognition, where only the most informative training samples are selected for manual annotation. In this paper, we propose a novel active learning method for Chinese acoustic modeling, the methods for initial training set selection based on Kullback-Leibler Divergence (KLD) and sample evaluation based on multi-level confusion networks are proposed and adopted in our active learning system, respectively. Our experiments show that our proposed method can achieve satisfying performances.展开更多
We describe a novel approach to Bayes risk(BR) decoding for speech recognition,in which we attempt to find the hypothesis that minimizes an estimate of the BR with regard to the minimum word error(MWE) metric.To achie...We describe a novel approach to Bayes risk(BR) decoding for speech recognition,in which we attempt to find the hypothesis that minimizes an estimate of the BR with regard to the minimum word error(MWE) metric.To achieve this,we propose improved forward and backward algorithms on the lattices and the whole procedure is optimized recursively.The remarkable characteristics of the proposed approach are that the optimization procedure is expectation-maximization(EM) like and the formation of the updated result is similar to that obtained with the confusion network(CN) decoding method.Experimental results indicated that the proposed method leads to an error reduction for both lattice rescoring and lattice-based system combinations,compared with CN decoding,confusion network combination(CNC),and ROVER methods.展开更多
Bayes risk (BR) decoding methods have been widely investigated in the speech recognition area due to its flexibility and complexity compared with the maximum a posteriori (MAP) method regarding to minimum word error (...Bayes risk (BR) decoding methods have been widely investigated in the speech recognition area due to its flexibility and complexity compared with the maximum a posteriori (MAP) method regarding to minimum word error (MWE) optimization. This paper investigates two improved approaches to the BR decoding, aiming at minimizing word error. The novelty of the proposed methods is shown in the explicit optimization of the objective function, the value of which is calculated by an improved forward algorithm on the lattice. However, the result of the first method is obtained by an expectation maximization (EM) like iteration, while the result of the second one is achieved by traversing the confusion network (CN), both of which lead to an optimized objective function value with distinct approaches. Experimental results indicate that the proposed methods result in an error reduction for lattice rescoring, compared with the traditional CN method for lattice rescoring.展开更多
基金Acknowledgements This study is supported by the National Natural Science Foundation of China (60705019), the National High-Tech Research and Development Plan of China ( 2006AA010102 and 2007AA01Z417), the NOKIA project, and the 111 Project of China under Grant No. 1308004.
文摘In speech recognition, acoustic modeling always requires tremendous transcribed samples, and the transcription becomes intensively time-consuming and costly. In order to aid this labor-intensive process, Active Learning (AL) is adopted for speech recognition, where only the most informative training samples are selected for manual annotation. In this paper, we propose a novel active learning method for Chinese acoustic modeling, the methods for initial training set selection based on Kullback-Leibler Divergence (KLD) and sample evaluation based on multi-level confusion networks are proposed and adopted in our active learning system, respectively. Our experiments show that our proposed method can achieve satisfying performances.
文摘We describe a novel approach to Bayes risk(BR) decoding for speech recognition,in which we attempt to find the hypothesis that minimizes an estimate of the BR with regard to the minimum word error(MWE) metric.To achieve this,we propose improved forward and backward algorithms on the lattices and the whole procedure is optimized recursively.The remarkable characteristics of the proposed approach are that the optimization procedure is expectation-maximization(EM) like and the formation of the updated result is similar to that obtained with the confusion network(CN) decoding method.Experimental results indicated that the proposed method leads to an error reduction for both lattice rescoring and lattice-based system combinations,compared with CN decoding,confusion network combination(CNC),and ROVER methods.
文摘Bayes risk (BR) decoding methods have been widely investigated in the speech recognition area due to its flexibility and complexity compared with the maximum a posteriori (MAP) method regarding to minimum word error (MWE) optimization. This paper investigates two improved approaches to the BR decoding, aiming at minimizing word error. The novelty of the proposed methods is shown in the explicit optimization of the objective function, the value of which is calculated by an improved forward algorithm on the lattice. However, the result of the first method is obtained by an expectation maximization (EM) like iteration, while the result of the second one is achieved by traversing the confusion network (CN), both of which lead to an optimized objective function value with distinct approaches. Experimental results indicate that the proposed methods result in an error reduction for lattice rescoring, compared with the traditional CN method for lattice rescoring.