摘要
RNA二级结构的打分函数在RNA二级结构预测中扮演着越来越重要的角色。目前对RNA二级结构的打分函数并没有很好地抓住RNA的折叠机制。我们认为递归神经网络层与层之间的信息传递方式和RNA的折叠方式有相似之处。提出使用双向LSTM(Long Short term Memory)神经网络对RNA二级结构进行打分。在数据集ASE(长度小于500),以及CRW(大部分长度大于1 000)上,进行了三项实验。通过拟合SEN(Sensitivity)与PPV(Specificity)打分函数确定了在目标函数为mean_squared_error时拟合效果最好;进而对比较复杂的打分函数MCC(Matthews correlation coefficient)进行拟合;最后实验得出双层双向LSTM模型的结果优于单层双向LSTM模型的结果。通过实验,得到的打分函数包含了碱基序列的全局属性。实验结果表明LSTM深度神经网络模型可以很好地拟合RNA二级结构的打分函数。
RNA Scoring Function plays a more and more important role in the RNA second structure prediction. At present,some scoring functions of RNA secondary structure do not have a good grasp of RNA folding mechanism. We believe that this mechanism and the way of information transmission between layers on recurrent neural network have similar aspects. Therefore,bidirectional Long Short Term Memory( LSTM) neural network was used to score the RNA secondary structure. We conducted three experiments based on the dataset ASE( length less than 1 000) and CRW( most of the length was greater than 1 000). By fitting the sensitivity( SEN) and specificity( PPV) scoring functions,it was determined that the fitting function was the best when the objective function is mean_squared_error. Then,we fitted the more complex scoring function Matthews Correlation Coefficient( MCC). Finally,the results of the two-layer bidirectional LSTM model were better than those of the single-layer bidirectional LSTM model. This article got the scoring function which contained global properties of the base sequence through experiments. Our approach shows that LSTM neural network model can fit the scoring function of RNA secondary structure well.
出处
《计算机应用与软件》
2017年第9期232-239,共8页
Computer Applications and Software
基金
国家自然科学基金项目(61170125)