The inclusion of more potentially correct words in the candidate sets is important to improve the accuracy of Large Vocabulary Continuous Speech Recognition (LVCSR). A candidate expansion algorithm based on the Weig...The inclusion of more potentially correct words in the candidate sets is important to improve the accuracy of Large Vocabulary Continuous Speech Recognition (LVCSR). A candidate expansion algorithm based on the Weighted Syllable Confusion Matrix (WSCM) is proposed. First, WSCM is derived from a confusion network. Then, the reeognised candidates in the confusion network is used to conjeeture the most likely correct words based on WSCM, after which, the conjectured words are combined with the recognised candidates to produce an expanded candidate set. Finally, a combined model having mutual information and a trigram language model is used to rerank the candidates. The experiments on Mandarin film data show that an improvement of 9.57% in the character correction rate is obtained over the initial recognition performance on those light erroneous utterances.展开更多
Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, an...Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.展开更多
基金supported by the National Natural Science Foundation of China under Grants No.61005004,No.61175011,No.61171193the Next-Generation Broadband Wireless Mobile Communications Network Technology Key Project under Grant No.2011ZX03002-005-01+2 种基金the One Church,One Family,One Purpose(111Project)under Grant No.B08004the Key Project of Ministry of Science and Technology of China under Grant No.2012ZX-03002019-002the National High Techni-cal Research and Development Program of China(863Program)under Grant No.2011A-A01A205
文摘The inclusion of more potentially correct words in the candidate sets is important to improve the accuracy of Large Vocabulary Continuous Speech Recognition (LVCSR). A candidate expansion algorithm based on the Weighted Syllable Confusion Matrix (WSCM) is proposed. First, WSCM is derived from a confusion network. Then, the reeognised candidates in the confusion network is used to conjeeture the most likely correct words based on WSCM, after which, the conjectured words are combined with the recognised candidates to produce an expanded candidate set. Finally, a combined model having mutual information and a trigram language model is used to rerank the candidates. The experiments on Mandarin film data show that an improvement of 9.57% in the character correction rate is obtained over the initial recognition performance on those light erroneous utterances.
基金supported by National Natural Science Foundation of China(Grant No.11301031)
文摘Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.