Conventional acoustic-to-articulatory inversion methods usually train the mapping model by using maximum likelihood or least square criterion,which assumes all the articulatory channels are equally important.In this p...Conventional acoustic-to-articulatory inversion methods usually train the mapping model by using maximum likelihood or least square criterion,which assumes all the articulatory channels are equally important.In this paper,the importance of each articulatory channel at each time instant is modeled as an exponential function of its velocity profile and incorporated into the conventional least square loss function.The loss function is applied to optimize a batch-normalized Deep Neural Network(DNN).The result shows that the DNN trained with proposed cost function outperforms the one trained with traditional cost function.展开更多
基金supported by the National Natural Science-Foundation of China (No.61175016,61304250)Key Fund projects of 61233009financial support from CASS Innovation Project “Articulatory model for pronunciation training”
文摘Conventional acoustic-to-articulatory inversion methods usually train the mapping model by using maximum likelihood or least square criterion,which assumes all the articulatory channels are equally important.In this paper,the importance of each articulatory channel at each time instant is modeled as an exponential function of its velocity profile and incorporated into the conventional least square loss function.The loss function is applied to optimize a batch-normalized Deep Neural Network(DNN).The result shows that the DNN trained with proposed cost function outperforms the one trained with traditional cost function.