期刊文献+

结合Transformer的轻量化中文语音识别 被引量:8

Lightweight Chinese speech recognition with Transformer
下载PDF
导出
摘要 近年来,深度神经网络模型在语音识别领域成为热门研究对象。然而,深层神经网络的构建依赖庞大的参数和计算开销,过大的模型体积也增加了其在边缘设备上部署的难度。针对上述问题,提出了基于Transformer的轻量化语音识别模型。首先使用深度可分离卷积获得音频特征信息;其次构建了双半步剩余权重前馈神经网络,即Macaron-Net结构,并引入低秩矩阵分解,实现了模型压缩;最后使用稀疏注意力机制,提升了模型的训练速度和解码速度。为了验证模型,在Aishell-1和aidatatang_200zh数据集上进行了测试。实验结果显示,与Open-Transformer相比,所提模型在字错误率上相对下降了19.8%,在实时率上相对下降了32.1%。 Recently,deep neural network model has become a hot research object in the field of speech recognition.How-ever,the deep neural network relies on huge parameters and computational overhead,the excessively large model size also increases the difficulty of its deployment on edge devices.Aiming at the above problems,this paper proposed a lightweight speech recognition model based on Transformer.This method used depthwise separable convolution to obtain the feature information.Secondly,this method constructed a two half-step feed-forward layers,namely Macaron-Net,and introduced the low-rank matrix factorization to realize the model compression.Finally,it used a sparse attention mechanism to improve the trai-ning speed and decoding speed of the model.It tested on the Aishell-1 and aidatang_200zh datasets.The experimental results show that compared with Open-Transformer,the word error rate and real time factor of LM-Transformer decrease by 19.8%and 32.1%,respectively.
作者 沈逸文 孙俊 Shen Yiwen;Sun Jun(School of Artificial Intelligence&Computer Science,Jiangnan University,Wuxi Jiangsu 214122,China)
出处 《计算机应用研究》 CSCD 北大核心 2023年第2期424-429,共6页 Application Research of Computers
基金 国家自然科学基金资助项目(61672263) 国家自然科学基金委员会联合基金资助项目(U1836218)。
关键词 语音识别 TRANSFORMER 低秩矩阵分解 轻量卷积 模型压缩 稀疏注意力 speech recognition Transformer low-rank matrix factorization lightweight convolution model compression sparse attention
  • 相关文献

参考文献7

二级参考文献11

共引文献62

同被引文献54

引证文献8

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部