摘要
As one of Chinese minority languages,Tibetan speech recognition technology was not researched upon as extensively as Chinese and English were until recently.This,along with the relatively small Tibetan corpus,has resulted in an unsatisfying performance of Tibetan speech recognition based on an end-to-end model.This paper aims to achieve an accurate Tibetan speech recognition using a small amount of Tibetan training data.We demonstrate effective methods of Tibetan end-to-end speech recognition via cross-language transfer learning from three aspects:modeling unit selection,transfer learning method,and source language selection.Experimental results show that the Chinese-Tibetan multi-language learning method using multilanguage character set as the modeling unit yields the best performance on Tibetan Character Error Rate(CER)at 27.3%,which is reduced by 26.1%compared to the language-specific model.And our method also achieves the 2.2%higher accuracy using less amount of data compared with the method using Tibetan multi-dialect transfer learning under the same model structure and data set.
基金
This work was supported by three projects.Zhao Y received the Grant with Nos.61976236 and 2020MDJC06
Bi X J received the Grant with No.20&ZD279.