摘要
Because of the excellent performance of Transformer in sequence learning tasks,such as natural language processing,an improved Transformer-like model is proposed that is suitable for speech emotion recognition tasks.To alleviate the prohibitive time consumption and memory footprint caused by softmax inside the multihead attention unit in Transformer,a new linear self-attention algorithm is proposed.The original exponential function is replaced by a Taylor series expansion formula.On the basis of the associative property of matrix products,the time and space complexity of softmax operation regarding the input's length is reduced from O(N2)to O(N),where N is the sequence length.Experimental results on the emotional corpora of two languages show that the proposed linear attention algorithm can achieve similar performance to the original scaled dot product attention,while the training time and memory cost are reduced by half.Furthermore,the improved model obtains more robust performance on speech emotion recognition compared with the original Transformer.
鉴于Transformer模型在自然语言处理等序列任务中的优异性能,提出了一种适用于语音情感识别任务的改进的类Transformer模型.为了减小Transformer模型中多头注意力单元内部由softmax运算引起的巨大时间消耗与内存开销,提出了一种新的线性自注意力计算方法,通过使用泰勒级数展开公式代替原来的指数函数,并根据矩阵乘积的关联性将softmax运算相对于输入序列长度的时间复杂度和空间复杂度从O(N2)降至O(N),其中N为序列长度.在2个不同语言的情感语料库上进行实验.结果表明:所提出的线性注意力算法可获得与原始缩放点积注意力相近的性能,而模型训练过程中的时间和内存开销大幅降低;与原始的Transformer模型相比,改进后的模型具有更鲁棒的语音情感识别性能.
基金
The National Key Research and Development Program of China(No.2020YFC2004002,2020YFC2004003)
the National Natural Science Foundation of China(No.61871213,61673108,61571106).