摘要
统计分析了不同细胞器基因组转录的非编码RNAs(non-coding RNAs,ncRNAs)的kmer频数、约化后的碱基组分、结构-序列模式中三联体偏好.并以上述三种特征提取方法分别构成特征向量表示ncRNA序列,利用支持向量机,对四类细胞器基因组转录的ncRNAs的序列进行识别.分析两种不同的碱基约化方式发现,嘌呤/嘧啶约化(MN约化)更能反应不同细胞器基因组转录的ncRNAs的序列信息;考虑结构和碱基种类的结构-序列模式(stru-seq mode)中的三联体短片段(k=3),揭示出ncRNA与编码蛋白质的mRNA或蛋白质相互作用可能存在局域结构三联体偏好.在Jackknife检验下,预测总精度最高达到83.10%.采用不同参数的预测结果表明,结构-序列模式(stru-seq mode)中的短片段(k=3)结构有助于不同细胞器基因组转录的ncRNAs区别.
The non-coding RNA (ncRNA) sequences from the four kinds of organelle genomes are analyzed. And these ncRNAs are recognized by using support vector machine algorithm based on the k-met components,the frequency of reduced base alphabets as well as the triplets of structure- sequence mode. The results indicate that purine/pyrimidine reduction (MN reduction) represents the more sequence information of different prganelle genome ncRNAs than PQ reduction. Based on the kinds of base and the structures of short fragments (k=3) in stru-seq mode, the results show that the interaction between ncRNAs and mRNAs or proteins may have preference of the triplet local structure. The best overall accuracy is 83.10% in Jackknife test. The predictive results for different parameters show that the structures of short fragments (k=3) in stru-seq mode are helpful for the recognition of ncRNAs from different organelle genomes.
出处
《内蒙古大学学报(自然科学版)》
CAS
北大核心
2015年第5期512-519,共8页
Journal of Inner Mongolia University:Natural Science Edition
基金
国家自然科学基金(No.31460234和No.61361015)