期刊文献+

基于Conformer-SE的端到端语音识别

End-to-end Speech Recognition Based on Conformer-SE
下载PDF
导出
摘要 基于自注意力机制的Transformer端到端模型在语音识别任务中表现出了卓越的性能.然而,该模型在浅层处理时对局部特征信息的捕捉能力存在一定的局限,同时也没有充分考虑不同块之间的相互依赖性.为了解决这些问题,提出了一种改进的Conformer-SE端到端语音识别系统模型.该模型首先采用了Conformer结构来替代Transformer中的编码器部分,从而增强了模型对局部特征的提取能力.接着,通过引入SE注意力通道机制,将每个块的输出以加权求和的形式整合到最终的输出中.在Aishell-1这一公开数据集上的实验结果显示,相较于原始的Transformer模型,Conformer-SE模型在字符错误率上相对降低了18.18%. The end-to-end Transformer model based on the self-attention mechanism shows superior performance in speech recognition.However,this model has limitations in capturing local feature information during shallow processing and does not fully consider the interdependence between different blocks.To address these issues,this study proposes Conformer-SE,an improved end-to-end model for speech recognition.The model first adopts the Conformer structure to replace the encoder in the Transformer model,thus enhancing its ability to extract local features.Next,by introducing the SE channel attention mechanism,it integrates the output of each block into the final output through a weighted sum.The experimental results on the Aishell-1 dataset show that the Conformer-SE model reduces the character error rate by 18.18%compared to the original Transformer model.
作者 马永杰 李罡 MA Yong-Jie;LI Gang(School of Information and Control Engineering,Jilin Institute of Chemical Technology,Jilin 132022,China;School of Mechanical and Control Engineering,Baicheng Normal University,Baicheng 137000,China)
出处 《计算机系统应用》 2024年第12期106-114,共9页 Computer Systems & Applications
基金 2022年度吉林省教育厅科学技术研究项目(JJKH20220013KJ) 2023年大学生创新创业训练计划(202310206035)。
关键词 语音识别 端到端 TRANSFORMER CONFORMER SE注意力通道 speech recognition end-to-end Transformer Conformer SE attention channel
  • 引文网络
  • 相关文献

参考文献15

二级参考文献54

  • 1艾佳琪,左毅,刘君霞,贺培超,李铁山,陈俊龙.基于余弦相似度的动态语音特征提取算法[J].计算机应用研究,2020,37(S02):147-149. 被引量:12
  • 2吕军,曹效英.基于语音识别的汉语发音自动评分系统的设计与实现[J].计算机工程与设计,2007,28(5):1232-1235. 被引量:12
  • 3Seman N, Bakar Z A, Bakar NA. The optimization of Artificial Neural Networks connection weights using genetic algorithms for isolated spoken Malay parliamentary speeches. 2010 International Conference on Computer and Information Application(ICCIA). IEEE. 2010. 162-166.
  • 4Lan ML, Pan ST, Lai CC. Using genetic algorithm to improve the performance of speech recognition based on artificial neural network. First International Conference on Innovative Computing, Information and Control(ICICIC\'06). IEEE. 2006, 2. 527-530.
  • 5Pan ST, Wu CH, Lai CC. The application of improved genetic algorithm on the training of neural network for speech recognition. Second International Conference on Innovative Computing, Information and Control(ICICIC\'07). IEEE, 2007. 168-168.
  • 6Aggarwal RK, Dave M. Application of genetically optimized neural networks for Hindi speech recognition system. 2011 World Congress on Information and Communication Technologies(WICT). IEEE. 2011. 512-517.
  • 7An M, Yu Z, Guo J, et al. The teaching experiment of speech recognition based on HMM. The 26th Chinese Control and Decision Conference(2014 CCDC). IEEE. 2014. 2416-2420.
  • 8Silva WLS, de Oliveira Serra GL. A novel intelligent system for speech recognition. International Joint Conference on Neural Networks(IJCNN). IEEE. 2014. 3599-3604.
  • 9王晓东,薛宏智,马盈仓.基于自适应遗传算法的神经网络字符识别[J].西安工程大学学报,2008,22(2):210-213. 被引量:4
  • 10许雪琼,余小清,李昌莲,万旺根.改进波形相似叠加算法的音频时长调整[J].应用科学学报,2009,27(5):514-519. 被引量:2

共引文献69

;
使用帮助 返回顶部