期刊文献+

k-长DNA子序列频数分布研究 被引量:1

THE RESEARCH OF THE OCCURRENCE FREQUENCY DISTRIBUTION OF k-MER IN WHOLE DNA SEQUENCE
下载PDF
导出
摘要 在详细阐述了生成DNA序列分形图像的Hao方法后,提出一种能够直观显示k-长DNA子序列频数分布差异性的三维频数分布图生成方法。把3D频数分布图转化为1D对数频谱图,突出显示了频数分布的局部特征,提出k-长DNA子序列频数区划分准则,并详细研究了甚高频数区的n阶零间隔现象,指出n阶零间隔分布就是基因组进化过程所留痕迹的假设,并给出对数频谱图特征的生物学解释。实验发现许多DNA序列频数概率分布近似服从非中心F分布,对于分布呈多峰现象的基因组序列,可采用多个非中心F分布的叠加来拟合。在比较非中心F分布与Gamma分布后,提出一种结合二者在拟合方面具有互补优势的新分布,实验证明这种新分布能够更好地吻合实际DNA序列的频数分布。最后研究了两种特异出现频数(最高出现频数与出现频数为1的k-长子序列个数)与k值的关系,发现不同物种的这两种关系具有良好的一致性。 The research of the k-mer distribution in genome is helpful for understanding the relationship between the structure of genome and its function, and it plays an important role in the recognition of repetitive subsequences, the partition into intron and exon and the investigation of genome evolution. After introducing Hao method which allows the depiction of frequency of k-mer in the form of fractal image, a novel method that can generate 3D frequency distribution map of k-mer in genome is proposed, and the advantage of the 3D frequency distribution map is that the difference of the k-mer occurrence frequency is exhibited obviously for biologist. Then the criterion of the partition of occurrence frequency segment is proposed on the basis of the 1D histogram which is transformed from 3D occurrence frequency distribution. 1D histogram can show the local feature of the occurrence frequency distribution of k-mer, i.e. the occurrence frequency of k-mer in ultrahigh frequency segment appears discontinuous in integer. The palindromes in forbidden k-mer are roughly studied in forbidden segment. Phenomena of n-order zero interval in ultrahigh frequency is deeply investigated. Moreover, it is proposed that the distribution of n-order zero interval is the mark of the process of genome evolving and many features of the logarithm histogram of occurrence frequency are successfully explained from the view of biology. On the basis of many experiments, it is discovered and validated that the occurrence frequency distribution of k-mer is subjected to non-central F distribution. Applying several non-central F distributions can fit the density distribution of the occurrence frequency of k-mer in genome which has the same number peaks. On the basis of experiments, the comparison between non-central F distribution and Gamma distribution which was proposed to fit genome distribution by I-Isieh and Luo is studied through experiments. Due to the complement of the two distributions in fitting genome density distribution, a new distribution which combines non-central F distribution with Gamma distribution is presented, and experiments show that the new distribution is better than any single of the two distributions in fttting genome density distribution. After the relationship between the maximal frequency of k-mer in genome and the length of k-mer and the relationship between the number of different k-mer which occur only once in genome and the length of k-mer are deeply investigated, and it is discovered that the two relationships among many species are consistent, which are the evidences of neutral evolution theory of genome.
出处 《生物物理学报》 CAS CSCD 北大核心 2006年第3期177-196,共20页 Acta Biophysica Sinica
基金 国家自然科学基金项目(60233020)~~
关键词 DNA序列 k-长DNA子序列 三维频数分布图 非中心F分布 分形 n阶零间隔 DNA sequence k-mer 3D frequency distribution map Non-central F distribution Fractal N-order zero interval
  • 相关文献

参考文献20

  • 1Fleischmann RD,Adams MD,White O.Whole-genome random sequencing and assembly of haemophilus influenzae Rd.Science,1995,269:496~512
  • 2The C.elegans Sequencing Consortium.Sequence and analysis of the genome of C.elegans.Science,1998,282:2012~2018
  • 3Jeffrey HJ.Chaos game representation of gene structure.Nucleic Acid Research,1990,18(8):2163~2170
  • 4Hao BL.Fractals from genomes-exact solutions of a biology-inspired problem.Physica A,2000,282:225~246
  • 5Hao BL,Lee HC,Zhang SY.Fractals related to long DNA sequences and complete genomes.Chaos,Solitons and Fractals,2000,11:825~836
  • 6Hao BL,Xie HM,Yu ZG.Factorizable language:from dynamics to bacterial complete genomes.Physica A,2000,288:10~20
  • 7Hao BL,Zheng WM.Applied Symbolic Dynamics and Chaos.Singapore:World Scientific,1998
  • 8Xie HM,Hao BL.Visualization of K-tuple distribution in prokaryote complete genomes and their randomized counterparts.IEEE Pro Comp sys Bioinf,2003,31~42
  • 9Shen J J,Zhang SY,Lee HC,Hao BL.SeeDNA:Visualization of k-string content of long DNA sequences and their randomized counterparts.Genomics,Proteomics & Bioinformatics,2004,2(3):192~196
  • 10Gorban AN,Popova TG,Sadovsky MG.Classification of symbol sequences over their frequency dictionaries:towards the connection between structure and natural taxonomy.Open system & information Dynamics,2000,7(1)

二级参考文献5

共引文献16

同被引文献29

  • 1冯立芹 ,李宏 .基因组中开阅读框架长度的分布模型与基因组进化[J].生物物理学报,2004,20(5):375-381. 被引量:5
  • 2Curran JF,Poole ES,Tate WP,Gross BL.Selection of aminoacyl-tRNAs at sense codons:the size of the tRNA variable loop determines whether the immediate 3' nucleotide to the codon has a context effect.Nucleic Acids Res.1995,23:4104~4108
  • 3Stenstrom CM,Jin HN,Major LL,Tate WP,Isaksson LA.Codon bias at the 3'-side of the initiation codon is correlated with translation initiation efficiency in Escherichia coli.Gene,2001,263:273~284
  • 4Skophammer RG,Herbold CW,Rivera MC,Servin JA,Lake JA.Evidence that the root of the tree of life is not within the archaea.Molecular Biology and Evolution,2006,23(9):1648~1651
  • 5Purvis IJ,Bettany AJ,Santiago TC,Coggins JR,Duncan K,Eason R,Brown AJ.The efficiency of folding of some proteins is increased by controlled rates of translation in vivo.J Mol Biol,1987,193:413~417
  • 6Andersson SGE,Kurland CG.Codon preferences in free living microorganisms.Microbiol Rev,1990,54:198~210
  • 7Kurland CG.Major codon preference:theme and vadation.Biochem Soc Trans,1993,21:841~845
  • 8Sharp PM,Matassi G.Codon usage and genome evolution.Curt Opin Genet Dev,1994,4:851~860
  • 9Ikemura T.Codon usage and tRNA content in unicellular and multicellular organisms.Mol Biol Evol,1985,2:12~34
  • 10Berg OG,Silva PJ.Codon bias in Eschenchia coli:the influence of codon context on mutation and selection.Nucleic Acids Res.1977,25:397~404

引证文献1

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部