Evidence seems to show that coding DNA is more random than noncoding DNA, but other conflictingevidence also exists. Based on the third-base degeneracy of codons, we regard the third position of codons as a 'noisy...Evidence seems to show that coding DNA is more random than noncoding DNA, but other conflictingevidence also exists. Based on the third-base degeneracy of codons, we regard the third position of codons as a 'noisy'position. By deleting one fixed position of non-overlapping triplets in a given sequence, three masked sequences may bededuced from the sequence. We have investigated the block-to-site mutual information functions of coding and noncodingsequences in yeast without and with the masking. Characteristics that distinguish coding from noncoding DNA havebeen found. It is observed that the strong correlations in the coding regions may be blocked by the third base of codons,and the proper masking can extract the correlations. Distribution of dimeric tandem repeats of unmasked sequences isalso compared with that of masked sequences.展开更多
Modeling non coding background sequences appropriately is important for the detection of regulatory elements from DNA sequences. Based on the chi square statistic test, some explanations about why to choose higher ...Modeling non coding background sequences appropriately is important for the detection of regulatory elements from DNA sequences. Based on the chi square statistic test, some explanations about why to choose higher order Markov chain model and how to automatically select the proper order are given in this paper. The chi square test is first run on synthetic data sets to show that it can efficiently find the proper order of Markov chain. Using chi square test, distinct higher order context dependences inherent in ten sets of sequences of yeast S.cerevisiae from other literature have been found. So the Markov chain with higher order would be more suitable for modeling the non coding background sequences than an independent model.展开更多
基金the Special Funds for Major National Basic Research Projects,国家自然科学基金
文摘Evidence seems to show that coding DNA is more random than noncoding DNA, but other conflictingevidence also exists. Based on the third-base degeneracy of codons, we regard the third position of codons as a 'noisy'position. By deleting one fixed position of non-overlapping triplets in a given sequence, three masked sequences may bededuced from the sequence. We have investigated the block-to-site mutual information functions of coding and noncodingsequences in yeast without and with the masking. Characteristics that distinguish coding from noncoding DNA havebeen found. It is observed that the strong correlations in the coding regions may be blocked by the third base of codons,and the proper masking can extract the correlations. Distribution of dimeric tandem repeats of unmasked sequences isalso compared with that of masked sequences.
文摘Modeling non coding background sequences appropriately is important for the detection of regulatory elements from DNA sequences. Based on the chi square statistic test, some explanations about why to choose higher order Markov chain model and how to automatically select the proper order are given in this paper. The chi square test is first run on synthetic data sets to show that it can efficiently find the proper order of Markov chain. Using chi square test, distinct higher order context dependences inherent in ten sets of sequences of yeast S.cerevisiae from other literature have been found. So the Markov chain with higher order would be more suitable for modeling the non coding background sequences than an independent model.