Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. How...Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. However, most of existing SCFG-based methods lack explicit phylogenic analysis of homologous RNA sequences, which is probably the reason why these methods are not ideal in practical application. Hence, we present a new SCFG-based method by integrating phylogenic analysis with the newly defined profile SCFG. The method can be summarized as: 1) we define a new profile SCFG, M, to depict consensus secondary structure of multiple RNA sequence alignment; 2) we introduce two distinct hidden Markov models, λ and λ', to perform phylogenic analysis of homologous RNA sequences. Here, λ' is for non-structural regions of the sequence and λ' is for structural regions of the sequence; 3) we merge λ and λ' into M to devise a combined model for prediction of RNA secondary structure. We tested our method on data sets constructed from the Rfam database. The sensitivity and specificity of our method are more accurate than those of the predictions by Pfold.展开更多
针对利用经典的随机上下文无关文法(SCFG)等模型对RNA(R ibonucle ic ac id)二级结构进行预测时,存在计算复杂性问题,该文给出了RNA二级结构的“新二级结构单元标签”(N SSEL)表示,相应提出了一种新的RNA二级结构预测的神经网络方法。...针对利用经典的随机上下文无关文法(SCFG)等模型对RNA(R ibonucle ic ac id)二级结构进行预测时,存在计算复杂性问题,该文给出了RNA二级结构的“新二级结构单元标签”(N SSEL)表示,相应提出了一种新的RNA二级结构预测的神经网络方法。这种二级结构的N SSEL表示格式很容易转换成常用的CT格式。基于tRNA数据集的实验表明,在完全相同的训练与测试数据集下,该方法,较之性能最好的B JK与BK 2等SCFG模型,其预测精度与相关系数都有所提高,证明了所提方法的可行性与有效性。由于神经网络启发式方法不存在计算时间复杂性问题,因此可望将此法用于预测SCFG等算法难以处理的大于1 000个碱基的长RNA序列的折叠问题。展开更多
基金the National Natural Science Foundation of China under Grant No.60673018
文摘Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. However, most of existing SCFG-based methods lack explicit phylogenic analysis of homologous RNA sequences, which is probably the reason why these methods are not ideal in practical application. Hence, we present a new SCFG-based method by integrating phylogenic analysis with the newly defined profile SCFG. The method can be summarized as: 1) we define a new profile SCFG, M, to depict consensus secondary structure of multiple RNA sequence alignment; 2) we introduce two distinct hidden Markov models, λ and λ', to perform phylogenic analysis of homologous RNA sequences. Here, λ' is for non-structural regions of the sequence and λ' is for structural regions of the sequence; 3) we merge λ and λ' into M to devise a combined model for prediction of RNA secondary structure. We tested our method on data sets constructed from the Rfam database. The sensitivity and specificity of our method are more accurate than those of the predictions by Pfold.
文摘针对利用经典的随机上下文无关文法(SCFG)等模型对RNA(R ibonucle ic ac id)二级结构进行预测时,存在计算复杂性问题,该文给出了RNA二级结构的“新二级结构单元标签”(N SSEL)表示,相应提出了一种新的RNA二级结构预测的神经网络方法。这种二级结构的N SSEL表示格式很容易转换成常用的CT格式。基于tRNA数据集的实验表明,在完全相同的训练与测试数据集下,该方法,较之性能最好的B JK与BK 2等SCFG模型,其预测精度与相关系数都有所提高,证明了所提方法的可行性与有效性。由于神经网络启发式方法不存在计算时间复杂性问题,因此可望将此法用于预测SCFG等算法难以处理的大于1 000个碱基的长RNA序列的折叠问题。