摘要
传统循环神经网络易发生梯度消失和网络退化问题.利用非饱和激活函数可以有效克服梯度消失的性质,同时借鉴卷积神经网络中的残差结构能够有效缓解网络退化的特性,在门控循环神经网络(Gated recurrent unit,GRU)的基础上提出了基于残差的门控循环单元(Residual-GRU,Re-GRU)来缓解梯度消失和网络退化问题.Re-GRU的改进主要包括两个方面:1)将原有GRU的候选隐状态的激活函数改为非饱和激活函数;2)在GRU的候选隐状态表示中引入残差信息.对候选隐状态激活函数的改动不仅可以有效避免由饱和激活函数带来的梯度消失问题,同时也能够更好地引入残差信息,使网络对梯度变化更敏感,从而达到缓解网络退化的目的.进行了图像识别、构建语言模型和语音识别3类不同的测试实验,实验结果均表明,Re-GRU拥有比对比方法更高的检测性能,同时在运行速度方面优于Highway-GRU和长短期记忆单元.其中,在语言模型预测任务中的Penn Treebank数据集上取得了23.88的困惑度,相比有记录的最低困惑度,该方法的困惑度降低了一半.
Traditional recurrent neural networks are prone to the problems of vanishing gradient and degradation.Relying on the facts that non-saturated activation functions can effectively overcome the vanishing gradient problem,and the residual structure in convolution neural network can effectively alleviate the degradation problem,we propose a residual-gated recurrent unit(Re-GRU)which leverages gated recurrent unit(GRU)to alleviate the problems of vanishing gradient and degradation.There are two main improvements in Re-GRU.One is to replace the activation function of the candidate hidden state in GRU with the non-saturated activation function.The other is to introduce the residual information into the candidate hidden state representation of the GRU.The modification of candidate hidden state activation function can not only effectively avoid vanishing gradient caused by non-saturated activation function,but also introduce residual information to make the network more sensitive to gradient change,so as to alleviate the degradation problem.We conducted three kinds of test experiments,including image recognition,building language model,and speech recognition.The results indicate that our proposed Re-GRU has higher detection performance than other 6methods.Specifically,we achieved a test-set perplexity of 23.88on the Penn Treebank data set in language model prediction task,which is one half of the lowest value ever recorded.
作者
张忠豪
董方敏
胡枫
吴义熔
孙水发
ZHANG Zhong-Hao;DONG Fang-Min;HU Feng;WU Yi-Rong;SUN Shui-Fa(College of Computer and Information Technology,China Three Gorges University,Yichang 443002;Yichang Key Laboratory of Intelligent Medicine,Yichang 443002)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2022年第12期3067-3074,共8页
Acta Automatica Sinica
基金
国家自然科学基金(U1703261,61871258)
国家重点研发计划(2016-YFB0800403)资助。
关键词
深度学习
循环神经网络
门控循环单元
残差连接
Deep learning
recurrent neural networks
gated recurrent unit
skip connect