一种改进的线性注意力机制语音识别方法被引量：1

Speech Recognition Model Based on Improved Linear Attention Mechanism

下载PDF

导出

摘要 Conformer模型因其优越的性能,吸引了越来越多研究者的关注,逐渐成为语音识别领域的主流模型,但因其采用注意力机制从输入中提取信息,需要对输入序列中所有样本点进行交互计算,导致网络计算复杂度为输入序列长度的平方,因此在对长语音进行识别时需要消耗更多计算资源,其识别速度较慢。针对此问题,本文提出一种线性注意力机制的语音识别方法。首先,提出一种新型门控线性注意力结构将多头注意力改进为单头,将注意力计算复杂度改进为序列长度的线性关系,以有效减少注意力计算复杂度。其次,为了弥补使用线性注意力导致的模型建模能力下降,在线性注意力求解过程中,综合使用局部注意力和全局注意力,联合线性注意力编码,提高模型识别精度。最后,为了进一步提升模型识别效果,在注意力损失和连接时序分类(connectionist temporal classification,CTC)损失的基础上使用注意力引导损失和中间CTC损失融合建模目标函数。在中文普通话数据集AISHELL-1和英文LibriSpeech数据集上的实验结果表明,改进模型的性能明显优于基线模型,且模型显存消耗下降,训练、识别速度得到较大提升。 The Conformer model has drawn more and more researchers attention and gradually become the mainstream model in the field of speech recognition because of its superior performance.But because it uses the attention mechanism to extract information from the input,which needs to be interactively calculated for all sample points in the input sequences,resulting in the complexity of the network calculation being the square of the length of the input sequences.So it needs to consume more computing resources when recognizing long speech sequences,and its recognition speed is slower.Aiming at solving this problem,this paper proposed a speech recognition method of linear attention mechanism.Firstly,a novel gated linear attention structure was proposed to effectively reduce the attention calculation complexity.The multi-head attention was improved to single head attention and the attention calculation complexity reduced to linear relationship of the sequence length.Secondly,in order to make up for the decline in modeling ability caused by the use of linear attention,the combination of local attention and global attention was used with the help of linear attention positional coding.Finally,in order to further improve the model recognition performance,the guided attention loss and intermediate connectionist temporal classification(CTC)loss was added to the objective function on the basis of attention loss and CTC loss.Experimental results on the Chinese Mandarin dataset AISHELL-1 and the English LibriSpeech dataset showed that the performance of the improved model was significantly better than the baseline model,and the video memory consumption of the model decreased,with the training and recognition speed greatly improved.

作者李宜亭屈丹杨绪魁张昊沈小龙 LI Yiting;QU Dan;YANG Xukui;ZHANG Hao;SHEN Xiaolong(College of Information Systems Engineering,PLA Strategic Force Information Engineering University,Zhengzhou,Henan 450001,China;Troops 95897 of PLA,Dalian,Liaoning 116001,China)

机构地区中国人民解放军战略支援部队信息工程大学信息系统工程学院中国人民解放军

出处《信号处理》 CSCD 北大核心 2023年第3期516-525,共10页 Journal of Signal Processing

基金国家自然科学基金(62171470) 河南省中原科技创新领军人才项目(234200510019) 河南省自然科学基金面上项目(232300421240)。

关键词语音识别端到端高效注意力连接时序分类 CONFORMER speech recognition end-to-end efficient attention connectionist temporal classification Conformer

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

同被引文献1

1时云龙,袁文浩,胡少东,娄迎曦.一种用于实时语音增强的卷积准循环网络[J].西安电子科技大学学报,2022,49(3):183-190. 被引量：3

引证文献1

1廉筱峪,夏楠,戴高乐,杨红琴.复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法[J].电子学报,2024,52(4):1282-1287.

1陈戈,谢旭康,孙俊,陈祺东.使用Conformer增强的混合CTC/Attention端到端中文语音识别[J].计算机工程与应用,2023,59(4):97-103. 被引量：5
2沈之杰,郭武.基于预训练与音素字节对编码的越南语识别[J].数据采集与处理,2023,38(1):101-110. 被引量：1
3廖家威,周勇,方夏,王玫,罗彬豪,朱高义.基于关联系数网络的电表异构信息提取方法[J].科学技术与工程,2023,23(2):665-673. 被引量：1
4卢江坤,许鸿奎,张子枫,周俊杰,李振业,郭文涛.语音识别中的Conformer模型压缩研究[J].计算机时代,2023(4):16-22.
5樊巧云.基于隐私计算技术的金融行业跨域融合建模研究[J].江苏通信,2023,39(1):95-101.
6周丽娟.浅谈培养学生数学阅读能力的策略[J].中国科技经济新闻数据库教育,2021(9):221-221.
7林竹.以“识字+阅读+思维”培养学生语文素养[J].四川教育,2023(6):28-28.
8高政霞.基于Transformer的多维度图像应用[J].科技创新与应用,2023,13(10):5-8.
9刘紫微,郑山红.基于一种新的联邦优化算法的信用风险预测方法[J].长春工业大学学报,2023,44(1):58-64. 被引量：1
10马赫,王海荣,周北京,孙崇,徐玺.基于表示学习的实体对齐方法综述[J].计算机工程与科学,2023,45(3):554-564. 被引量：2

信号处理

2023年第3期

浏览历史

内容加载中请稍等...

一种改进的线性注意力机制语音识别方法被引量：1

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种改进的线性注意力机制语音识别方法 被引量：1

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种改进的线性注意力机制语音识别方法被引量：1