摘要
近年来,深度神经网络模型在语音识别领域成为热门研究对象。然而,深层神经网络的构建依赖庞大的参数和计算开销,过大的模型体积也增加了其在边缘设备上部署的难度。针对上述问题,提出了基于Transformer的轻量化语音识别模型。首先使用深度可分离卷积获得音频特征信息;其次构建了双半步剩余权重前馈神经网络,即Macaron-Net结构,并引入低秩矩阵分解,实现了模型压缩;最后使用稀疏注意力机制,提升了模型的训练速度和解码速度。为了验证模型,在Aishell-1和aidatatang_200zh数据集上进行了测试。实验结果显示,与Open-Transformer相比,所提模型在字错误率上相对下降了19.8%,在实时率上相对下降了32.1%。
Recently,deep neural network model has become a hot research object in the field of speech recognition.How-ever,the deep neural network relies on huge parameters and computational overhead,the excessively large model size also increases the difficulty of its deployment on edge devices.Aiming at the above problems,this paper proposed a lightweight speech recognition model based on Transformer.This method used depthwise separable convolution to obtain the feature information.Secondly,this method constructed a two half-step feed-forward layers,namely Macaron-Net,and introduced the low-rank matrix factorization to realize the model compression.Finally,it used a sparse attention mechanism to improve the trai-ning speed and decoding speed of the model.It tested on the Aishell-1 and aidatang_200zh datasets.The experimental results show that compared with Open-Transformer,the word error rate and real time factor of LM-Transformer decrease by 19.8%and 32.1%,respectively.
作者
沈逸文
孙俊
Shen Yiwen;Sun Jun(School of Artificial Intelligence&Computer Science,Jiangnan University,Wuxi Jiangsu 214122,China)
出处
《计算机应用研究》
CSCD
北大核心
2023年第2期424-429,共6页
Application Research of Computers
基金
国家自然科学基金资助项目(61672263)
国家自然科学基金委员会联合基金资助项目(U1836218)。