摘要
Conventional acoustic-to-articulatory inversion methods usually train the mapping model by using maximum likelihood or least square criterion,which assumes all the articulatory channels are equally important.In this paper,the importance of each articulatory channel at each time instant is modeled as an exponential function of its velocity profile and incorporated into the conventional least square loss function.The loss function is applied to optimize a batch-normalized Deep Neural Network(DNN).The result shows that the DNN trained with proposed cost function outperforms the one trained with traditional cost function.
出处
《中国语音学报》
2019年第1期35-41,共7页
Chinese Journal of Phonetics
基金
supported by the National Natural Science-Foundation of China (No.61175016,61304250)
Key Fund projects of 61233009
financial support from CASS Innovation Project “Articulatory model for pronunciation training”