摘要
【目的】为生物基因组序列注释提供一定科学依据,加快序列注释进度和精确性。【方法】以粳稻日本晴(Oryza sativa L.ssp.japonica cv.Nipponbare)较长的6号染色体的序列为例,采用生物信息学手段,详细探索了Fgenesh(v2.0)对单子叶模式植物水稻基因的预测。【结果】预测的基因数涵盖了注释的基因数且两者相差不大;预测的基因以多外显子基因为主,占总基因的77.52%;预测基因的长度变化幅度很大;从显著匹配数上来看,Fgenesh对多外显子基因预测准确性较高,其中TIGR注释支持度达到100%,cDNA的支持度也在78%以上;从Fgenesh对多外显子基因不同位置的外显子预测来看,居间外显子和末端外显子的cDNA支持度较高;高支持度的多外显子基因中居间外显子的长度较短,而起始外显子和末端外显子较长;高支持度的单外显子基因长度多数较短;从外显子数目上看,高支持度多外显子基因的外显子数目主要集中在5以下。【结论】Fgenesh对水稻基因的预测有较高的准确性,但需要将预测结果与cDNA数据库进行序列比对,根据cDNA的支持情况对预测结果做必要的修订。
【Objective】The aim of this study is to give some scientifically reasons for genome annotation,shorten the annotating time and improve the results of gene prediction.【Method】Taking the sequence of chromosome no. 6,which has more length sequences than others of Oryza sativa L. ssp. japonica cv. Nipponbare,as analysis data in this research,the gene prediction of monocots module,rice,was made by using Fgenesh version 2.0 and the predicting results have been explored particularly by bioinformatics methods. 【Result】The number of predicted genes for this chromosome is very closely to the number of TIGR annotated genes. The majority of the predicted genes are multi-exon genes which have a percentage of 77.52; Length range is very big in the predicted genes. According to the significant match number,multi-exon genes can predict more veracity than single exon genes and the support can reach to 100% by TIGR annotation and to 78% by cDNA. From the angle of predicted exons location of multi-exon genes,the internal exons and last exons have a high support of cDNA. The length of internal exons is relatively short in high (〉95% length,〉78% similarity) cDNA and/or TIGR annotation support multi-exon genes,but the first exons and last exons on the reverse. The majority of single exon genes which have more than 95% in length and 78% in similarity support by cDNA and/or TIGR annotation is relatively short in length. From the angle of exon number,the majority of the multi-exon genes of high (〉 95% length,〉78% similarity) cDNA and/or TIGR annotation support have the exon number no more than 5. 【Conclusion】 The rice gene prediction by Fgenesh is very good,but need modified manually to some extent according to cDNA support after aligning the predicting sequence of genes with cDNA database of rice.
出处
《中国农业科学》
CAS
CSCD
北大核心
2008年第6期1567-1574,共8页
Scientia Agricultura Sinica
基金
国家自然科学基金(301705760)
国家“863”重大专项计划(2002AA207004)