摘要
分析了5种真核、15种细菌和10种古菌基因组中开阅读框架(openreadingframe,ORF)的数目随长度的分布,发现不同生物的分布相似且有明显的规律性。用各种分布模型进行拟合比较,结果显示每种生物的这类分布均符合Г(α,β)分布,由此提出生物基因组中ORF的数目随长度的分布是Г(α,β)分布的假设。分析各生物基因组的拟合参数,发现α和β值与基因组进化存在明显的相关性;讨论了α和β值的生物进化意义,并给出了真核生物偏好使用长基因的结论;依照Г(α, β)分布估计了酵母基因组中ORF数目的上限为5870个。该方法对于研究生物基因组进化以及评估理论预测基因的可靠性具有建设性意义。
The distributions of number of open reading frame with its length were analyzed in 5 Eukarya, 15 Bacteria and 10 Archaea genomes. The results showed that their distributions had similar form and obvious regulation. According to the characteristics of their distribution forms, we proposed a hypothesis that this kind of distribution was Г(α,β) distribution. Compared with other distribution models, the Г(α,β) distribution model is in accord with that of ORF's number with its length for all of the 30 genomes. By studying the parameter α and β values of Г(α,β) distribution, a distinct correlation between the values of α and β and the genome evolution was found. The evolution meanings of α and β were discussed and a conclusion that Eukarya had a bias towards the longer ORFs was obtained. In terms of the Г(α,β) distribution, it was estimated that the maximum number of protein coding sequences in Saccharomyces cerevisiae was approximately 5870. This theoretical method used in this paper has constructive significance for studying the genome evolution and evaluating the reliability of gene identification.
出处
《生物物理学报》
CAS
CSCD
北大核心
2004年第5期375-381,共7页
Acta Biophysica Sinica
基金
国家自然科学基金项目(10147204)
内蒙古自然科学基金项目
关键词
基因组
ORF
Г(α
β)分布
基因组进化
Genomes, Open reading frame
Г(α,β) distribution
Genome evolution