摘要
针对基因组新测序物种缺乏高质量的基因结构用于从头预测软件训练的现状,本文提出了一种以新测序物种自身RNA-seq组装为基础的可靠基因训练集构建方法(Building reliable training gene set,BRTGS)。该方法利用RNA-seq组装获得大量初始基因结构,然后根据蛋白同源证据筛选具有正确且编码区相对完整的基因结构,最后综合利用RNA-seq组装结构和蛋白同源证据统计信息确定的基因起始密码子和终止密码子位置,从而获得基因完整的编码结构。实验结果表明,该方法不仅可为各种组装水平的基因组构建高质量的基因训练集,而且从头预测软件在这些基因集上训练后能够获得很好的预测性能。
There are no extant high-quality gene structures for newly sequenced genomes to train ab initio gene prediction algorithms.In the study,we present the building reliable training gene set(BRTGS)computational method for building reliable training gene set from RNA-seq assembly.Firstly,the initial gene structures are obtained from RNA-seq assembly.Then,the gene structures with complete and correct coding region are identified with the alignments of transcripts against homology protein.Finally,the sites of start and stop codon are determined according to the homology evidences and RNA-seq assembly structures.Experimental results show that BRTGS can build high-quality of training gene set for various genomes and ab initio algorithms trained on the gene sets can obtain good prediction performance.
作者
段荣静
刘金定
Duan Rongjing;Liu Jinding(Research Center for Correlation of Domain Knowledge,Nanjing Agricultural University,Nanjing,210095,China;Bioinformatics Center,Nanjing Agricultural University,Nanjing,210095,China)
出处
《数据采集与处理》
CSCD
北大核心
2018年第4期637-645,共9页
Journal of Data Acquisition and Processing
基金
国家自然科学基金(31301691)资助项目
教育部中央高校基本业务费(KYZ201667
KJQN201430)资助项目