A transformation matrix linear interpolation (TMLI) approach for speaker adaptation is proposed. TMLI uses the transformation matrixes produced by MLLR from selected training speakers and the testing speaker. With onl...A transformation matrix linear interpolation (TMLI) approach for speaker adaptation is proposed. TMLI uses the transformation matrixes produced by MLLR from selected training speakers and the testing speaker. With only 3 adaptation sentences, the performance shows a 12.12% word error rate reduction. As the number of adaptation sentences increases, the performance saturates quickly. To improve the behavior of TMLI for large amounts of adaptation data, the TMLI+MAP method which combines TMLI with MAP technique is proposed. Experimental results show TMLI+MAP achieved better recognition accuracy than MAP and MLLR+MAP for both small and large amounts of adaptation data. Key words speech recognition - speaker adaptation - MLLR - MAP - maximum likelihood model interpolation (MLMI) CLC number TN 912. 34 Foundation item: Supported by the Science and Technology Committee of Shanghai (01JC14033)Biography: XU Xiang-hua (1977-), female, Ph. D. candidate, research direction: large vocabulary continuous Mandarin speech recognition and speaker adaptation展开更多
A speaker adaptation method that combines transformation matrix linear interpolation with maximum a posteriori (MAP) was proposed. Firstly this method can keep the asymptotical characteristic of MAP. Secondly, as the ...A speaker adaptation method that combines transformation matrix linear interpolation with maximum a posteriori (MAP) was proposed. Firstly this method can keep the asymptotical characteristic of MAP. Secondly, as the method uses linear interpolation with several speaker-dependent (SD) transformation matrixes, it can fully use the prior knowledge and keep fast adaptation. The experimental results show that the combined method achieves an 8.24% word error rate reduction with only one adaptation utterance, and keeps asymptotic to the performance of SD model for large amounts of adaptation data.展开更多
In recent years,the eigenvoice approach has proven to be an efficient method for rapid speaker adaptation,which directs the adaptation according to the analysis of full speaker vector space.In this article,we develope...In recent years,the eigenvoice approach has proven to be an efficient method for rapid speaker adaptation,which directs the adaptation according to the analysis of full speaker vector space.In this article,we developed a new algorithm for eigenspace-based adaptation restricting eigenvoices in clustered subspaces,and maximum likelihood(ML)criterion was replaced with maximum aposteriori(MAP)criterion for better parameter estimation.Experiments show that even with one sentence adaptation data this algorithm would result in 6.45%error ratio reduction relatively,which overcomes the instability of maximum likelihood linear regression(MLLR)with limited data and is much faster than traditional MAP method.This algorithm is not highly-dependent on subspace number of division,thus it proved to be a robust adaptation algorithm.展开更多
A stronger canonical model was developed to improve the performance of automatic pronunciation evaluations. Three different strategies were investigated with speaker adaptive training to normalize variations among spe...A stronger canonical model was developed to improve the performance of automatic pronunciation evaluations. Three different strategies were investigated with speaker adaptive training to normalize variations among speakers, minimum phone error training to identify easily confused phones and maximum likelihood linear regression (MLLR) adaptation to compensate for accent variations between native and non-native speakers. The three schemes were combined to improve the correlation coefficient between machine scores and human scores from 0.651 to 0.679 on the sentence level and from 0.788 to 0.822 on the speaker level.展开更多
文摘A transformation matrix linear interpolation (TMLI) approach for speaker adaptation is proposed. TMLI uses the transformation matrixes produced by MLLR from selected training speakers and the testing speaker. With only 3 adaptation sentences, the performance shows a 12.12% word error rate reduction. As the number of adaptation sentences increases, the performance saturates quickly. To improve the behavior of TMLI for large amounts of adaptation data, the TMLI+MAP method which combines TMLI with MAP technique is proposed. Experimental results show TMLI+MAP achieved better recognition accuracy than MAP and MLLR+MAP for both small and large amounts of adaptation data. Key words speech recognition - speaker adaptation - MLLR - MAP - maximum likelihood model interpolation (MLMI) CLC number TN 912. 34 Foundation item: Supported by the Science and Technology Committee of Shanghai (01JC14033)Biography: XU Xiang-hua (1977-), female, Ph. D. candidate, research direction: large vocabulary continuous Mandarin speech recognition and speaker adaptation
文摘A speaker adaptation method that combines transformation matrix linear interpolation with maximum a posteriori (MAP) was proposed. Firstly this method can keep the asymptotical characteristic of MAP. Secondly, as the method uses linear interpolation with several speaker-dependent (SD) transformation matrixes, it can fully use the prior knowledge and keep fast adaptation. The experimental results show that the combined method achieves an 8.24% word error rate reduction with only one adaptation utterance, and keeps asymptotic to the performance of SD model for large amounts of adaptation data.
文摘In recent years,the eigenvoice approach has proven to be an efficient method for rapid speaker adaptation,which directs the adaptation according to the analysis of full speaker vector space.In this article,we developed a new algorithm for eigenspace-based adaptation restricting eigenvoices in clustered subspaces,and maximum likelihood(ML)criterion was replaced with maximum aposteriori(MAP)criterion for better parameter estimation.Experiments show that even with one sentence adaptation data this algorithm would result in 6.45%error ratio reduction relatively,which overcomes the instability of maximum likelihood linear regression(MLLR)with limited data and is much faster than traditional MAP method.This algorithm is not highly-dependent on subspace number of division,thus it proved to be a robust adaptation algorithm.
基金Supported by the National High-Tech Research and Development (863) Program of China (No. 2008AA01Z118)
文摘A stronger canonical model was developed to improve the performance of automatic pronunciation evaluations. Three different strategies were investigated with speaker adaptive training to normalize variations among speakers, minimum phone error training to identify easily confused phones and maximum likelihood linear regression (MLLR) adaptation to compensate for accent variations between native and non-native speakers. The three schemes were combined to improve the correlation coefficient between machine scores and human scores from 0.651 to 0.679 on the sentence level and from 0.788 to 0.822 on the speaker level.