摘要
基于符号动力学原理,提出了一种新的RNA二级结构序列的图形表示方法.通过生物信息和自由能两种信息,该图形表示方法将RNA二级结构序列中的自由基和碱基对分别映射成两类时间序列.这种映射方法不仅能够在转换过程中不丢失任何数据信息,而且在二维图形中也能够清楚地识别配对碱基所在的区域.基于该图形表示方法对二级结构的表示结果构建特征矩阵.进一步由该特征矩阵的最大特征值组成用于相似性分析的向量.采用新的相似性分析方法,分别从时域和频域对不同病毒在3′末端的RNA二级结构序列集合进行定性和定量的相似度分析.仿真结果表明,该方法能够有效地实现RNA二级结构序列的相似度分析.与其他方法相比,新方法所得结果中数值差值较大,有利于区分不同物种.
Based on the principle of symbolic dynamics, a novel graphical representation of RNA secondary structures is proposed. The free bases and paired bases in RNA secondary structures are mapped into two kinds of discrete time sequences by considering the biological information in free bases and free energy in paired bases, respectively. With no loss of information in the transfer of data from RNA secondary structures to their mathematical representation, the proposed graphical representation can also identify the paired regions of RNA in 2D graph, clearly. Based on this graphical representation, the characteristic matrices are constructed, and a vector consisting of the leading eigenvalues of these matrices are then designed for comparison of RNA secondary structures. In time and frequency domains, quantitative and qualitative analysis are performed to distinguish a set of RNA secondary structures at the 3Cterminus of different viruses, and similar results are acquired in the two domains. The examination of similarities/dissimilarities illustrates the utility of the proposed graphical representation. Compared with other methods for similarity analysis, this proposed method can obtain the larger numerical difference between the dissimilar species and the similar ones, which will help to discriminate different species more easily.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2013年第2期445-452,共8页
Journal of Computer Research and Development
基金
中央高校基本科研业务费专项项目(CDJXS10160001)
国家自然科学基金项目(61001157
61101232)
西南大学博士基金项目(SWU111027)
关键词
RNA二级结构
相似性分析
图形表示
符号动力学
离散傅里叶变换
RNA secondary structure
similarity analysis
graphical representation
symbolicdynamics
discrete Fourier transform