摘要
分析了人类24条染色体基因组中蛋白质编码序列的数目随长度的分布,发现分布规律有明显的相似性;用Г(α,β)分布对实际分布进行拟合,其特征参数α均小于1,即蛋白质编码序列是呈随长度减少而其数目一直增加的分布.而研究的其它生物(15种真细菌,10种古核菌和5种真核生物)均是α>1的Г(α,β)分布.经过分析比较,推测人类蛋白质编码序列的分布也应该是α>1的Г(α,β)分布.在对短序列补充了推测数据后,重新对数据拟合,效果很好,α值在1.19~1.85之间.生物基因所遵从的Г(α,β)分布规律对研究基因组进化及评估理论预测的基因准确性具有积极意义.
The distributions of protein coding sequences' number with its length in 24 chromosomes of the human genome were analyzed. The results showed that their distributions had similar form. By use of the Γ(α, β) distribution fitting to the real distributions, the values of its parameter α were all smaller than 1. That is to say, the number of protein coding sequences increased all the time with its length' s decrease. But the α values in other organisms (15 bacteria,10 archaea, and 5 eukaryotes) were all larger than 1. Compared with these results, we argue that the gene distribution in the human genome was also the Γ( α , β) distribution with α〉1. By complementing some available data to the short sequences and fitting to the new date by the Γ( α, β) distribution, a good fitting result was obtained and the values of the parameter α were between 1.19 and 1.85. The Γ( α ,β ) distribution abided by genes has constructive significance for studying human genome evolution and evaluating the reliability of genes identified by theoretical methods.
出处
《内蒙古民族大学学报(自然科学版)》
2009年第1期58-64,共7页
Journal of Inner Mongolia Minzu University:Natural Sciences