As one of Chinese minority languages,Tibetan speech recognition technology was not researched upon as extensively as Chinese and English were until recently.This,along with the relatively small Tibetan corpus,has resu...As one of Chinese minority languages,Tibetan speech recognition technology was not researched upon as extensively as Chinese and English were until recently.This,along with the relatively small Tibetan corpus,has resulted in an unsatisfying performance of Tibetan speech recognition based on an end-to-end model.This paper aims to achieve an accurate Tibetan speech recognition using a small amount of Tibetan training data.We demonstrate effective methods of Tibetan end-to-end speech recognition via cross-language transfer learning from three aspects:modeling unit selection,transfer learning method,and source language selection.Experimental results show that the Chinese-Tibetan multi-language learning method using multilanguage character set as the modeling unit yields the best performance on Tibetan Character Error Rate(CER)at 27.3%,which is reduced by 26.1%compared to the language-specific model.And our method also achieves the 2.2%higher accuracy using less amount of data compared with the method using Tibetan multi-dialect transfer learning under the same model structure and data set.展开更多
藏语音存在语料库缺少和地区方言较多等问题,因此关于藏语音的识别技术相对缺乏。基于此,提出一种使用卷积神经网络(Convolutional Neural Network,CNN)、长短期记忆(Long Short Term Memory,LSTM)神经网路和动态神经网络(Dynamic Neura...藏语音存在语料库缺少和地区方言较多等问题,因此关于藏语音的识别技术相对缺乏。基于此,提出一种使用卷积神经网络(Convolutional Neural Network,CNN)、长短期记忆(Long Short Term Memory,LSTM)神经网路和动态神经网络(Dynamic Neural Network,DNN)的基于Python平台上TensorFlow框架的深度混合网络模型。首先,录制来自拉萨市、安多县和昌都市3个地区的藏语音数据制作语音数据集,并通过改进模型深度、结构、参数和算法来提升藏语音识别的准确率;其次,使用多层卷积残差网络和改进的LSTM神经网络解决模型训练过程中的梯度爆炸问题;最后,使用反向传播算法提高模型训练的准确度。仿真实验表明,该模型虽然在不同地区的藏语音数据识别准确率上存在差异,但是在整体的识别准确率和模型的收敛性上具有不错的效果。展开更多
为了使用计算机辅助语言学习系统(Computer Aided Language Learning,CALL)帮助藏族学生在学习普通话时及时发现和纠正错误发音,构建了一个适用于藏族学生普通话发音评估的语料库.从声母、韵母及声调的概念来比较分析藏语学生的普通话...为了使用计算机辅助语言学习系统(Computer Aided Language Learning,CALL)帮助藏族学生在学习普通话时及时发现和纠正错误发音,构建了一个适用于藏族学生普通话发音评估的语料库.从声母、韵母及声调的概念来比较分析藏语学生的普通话语音特征,归纳出藏族学生易混淆的声、韵、调,并进行文本语料的设计和语音语料的录制.对录制的音频文件用国际通用标注软件PRAAT进行分层标注,对标注好的语料进行分类编号.实验结果证明,该语料库可以及时纠正藏族学生学习普通话时的错误发音.展开更多
基金This work was supported by three projects.Zhao Y received the Grant with Nos.61976236 and 2020MDJC06Bi X J received the Grant with No.20&ZD279.
文摘As one of Chinese minority languages,Tibetan speech recognition technology was not researched upon as extensively as Chinese and English were until recently.This,along with the relatively small Tibetan corpus,has resulted in an unsatisfying performance of Tibetan speech recognition based on an end-to-end model.This paper aims to achieve an accurate Tibetan speech recognition using a small amount of Tibetan training data.We demonstrate effective methods of Tibetan end-to-end speech recognition via cross-language transfer learning from three aspects:modeling unit selection,transfer learning method,and source language selection.Experimental results show that the Chinese-Tibetan multi-language learning method using multilanguage character set as the modeling unit yields the best performance on Tibetan Character Error Rate(CER)at 27.3%,which is reduced by 26.1%compared to the language-specific model.And our method also achieves the 2.2%higher accuracy using less amount of data compared with the method using Tibetan multi-dialect transfer learning under the same model structure and data set.
文摘藏语音存在语料库缺少和地区方言较多等问题,因此关于藏语音的识别技术相对缺乏。基于此,提出一种使用卷积神经网络(Convolutional Neural Network,CNN)、长短期记忆(Long Short Term Memory,LSTM)神经网路和动态神经网络(Dynamic Neural Network,DNN)的基于Python平台上TensorFlow框架的深度混合网络模型。首先,录制来自拉萨市、安多县和昌都市3个地区的藏语音数据制作语音数据集,并通过改进模型深度、结构、参数和算法来提升藏语音识别的准确率;其次,使用多层卷积残差网络和改进的LSTM神经网络解决模型训练过程中的梯度爆炸问题;最后,使用反向传播算法提高模型训练的准确度。仿真实验表明,该模型虽然在不同地区的藏语音数据识别准确率上存在差异,但是在整体的识别准确率和模型的收敛性上具有不错的效果。
文摘为了使用计算机辅助语言学习系统(Computer Aided Language Learning,CALL)帮助藏族学生在学习普通话时及时发现和纠正错误发音,构建了一个适用于藏族学生普通话发音评估的语料库.从声母、韵母及声调的概念来比较分析藏语学生的普通话语音特征,归纳出藏族学生易混淆的声、韵、调,并进行文本语料的设计和语音语料的录制.对录制的音频文件用国际通用标注软件PRAAT进行分层标注,对标注好的语料进行分类编号.实验结果证明,该语料库可以及时纠正藏族学生学习普通话时的错误发音.