In order to improve the Mandarin vowel pronunciation quality assessment, a nox/el formant feature was proposed and applied to formant classification for Chinese Mandarin vowel pronunciation quality evaluation. Formant...In order to improve the Mandarin vowel pronunciation quality assessment, a nox/el formant feature was proposed and applied to formant classification for Chinese Mandarin vowel pronunciation quality evaluation. Formant candidates of each frame were plotted on the time-frequency plane to form a bitmap, and its Gabor feature was extracted to represent the formant trajectory. The feature was then classified by using GMM model and the classification posterior probability was mapped to pronunciation quality grade. The experiments of comparing the Gabor transformation based formant trajectory feature with several other kinds of traditionally used features show that with this method, a human-machine scoring correlation coefficient (CC) of 0.842 can be achieved, which is better than the result of 0.832 by traditional speech recognition techniques. At the same time, considering that the long-term information of formant classification and the short-term information of speech recognition technique are complementary to each other, it is investigated to combine their results with linear or nonlinear methods to further improve the evaluation performance. As a result, experiments on PSK show that the best CC of 0.913, which is very close to the correlation of inter-human rating of 0.94, is gotten by using neural network.展开更多
基金Project(61062011)supported by the National Natural Science Foundation of ChinaProject(2010GXNSFA013128)supported by the Natural Science Foundation of Guangxi Province,China
文摘In order to improve the Mandarin vowel pronunciation quality assessment, a nox/el formant feature was proposed and applied to formant classification for Chinese Mandarin vowel pronunciation quality evaluation. Formant candidates of each frame were plotted on the time-frequency plane to form a bitmap, and its Gabor feature was extracted to represent the formant trajectory. The feature was then classified by using GMM model and the classification posterior probability was mapped to pronunciation quality grade. The experiments of comparing the Gabor transformation based formant trajectory feature with several other kinds of traditionally used features show that with this method, a human-machine scoring correlation coefficient (CC) of 0.842 can be achieved, which is better than the result of 0.832 by traditional speech recognition techniques. At the same time, considering that the long-term information of formant classification and the short-term information of speech recognition technique are complementary to each other, it is investigated to combine their results with linear or nonlinear methods to further improve the evaluation performance. As a result, experiments on PSK show that the best CC of 0.913, which is very close to the correlation of inter-human rating of 0.94, is gotten by using neural network.