[Objective] The research aimed to construct the discriminant classification model of DNA sequence by combining with the biology knowledge and the mathematical method.[Method] According to the polarity nature of side c...[Objective] The research aimed to construct the discriminant classification model of DNA sequence by combining with the biology knowledge and the mathematical method.[Method] According to the polarity nature of side chain radical in the amino acid,the classification information of amino acid which represented the sequence characteristic from the content and array situation of base was extracted from the different sequences that the amino acid content was different.The four-dimension vector was used to represent.Mahalanobis distance and Fisher discriminant methods were used to classify the given sequence.[Result] In the model,the back substitution rates of sample obtained by two kinds of classification methods were both 100%,and the consistent rate of classification was 90%.[Conclusion] In the model,the calculation method was simple,and the accuracy of classification result was higher.It was superior to the discriminant classification model which was only based on the base content.展开更多
Evidence seems to show that coding DNA is more random than noncoding DNA, but other conflictingevidence also exists. Based on the third-base degeneracy of codons, we regard the third position of codons as a 'noisy...Evidence seems to show that coding DNA is more random than noncoding DNA, but other conflictingevidence also exists. Based on the third-base degeneracy of codons, we regard the third position of codons as a 'noisy'position. By deleting one fixed position of non-overlapping triplets in a given sequence, three masked sequences may bededuced from the sequence. We have investigated the block-to-site mutual information functions of coding and noncodingsequences in yeast without and with the masking. Characteristics that distinguish coding from noncoding DNA havebeen found. It is observed that the strong correlations in the coding regions may be blocked by the third base of codons,and the proper masking can extract the correlations. Distribution of dimeric tandem repeats of unmasked sequences isalso compared with that of masked sequences.展开更多
基金Supported by Science Research Project of Ningbo Dahongying University in2011(CF102601)~~
文摘[Objective] The research aimed to construct the discriminant classification model of DNA sequence by combining with the biology knowledge and the mathematical method.[Method] According to the polarity nature of side chain radical in the amino acid,the classification information of amino acid which represented the sequence characteristic from the content and array situation of base was extracted from the different sequences that the amino acid content was different.The four-dimension vector was used to represent.Mahalanobis distance and Fisher discriminant methods were used to classify the given sequence.[Result] In the model,the back substitution rates of sample obtained by two kinds of classification methods were both 100%,and the consistent rate of classification was 90%.[Conclusion] In the model,the calculation method was simple,and the accuracy of classification result was higher.It was superior to the discriminant classification model which was only based on the base content.
基金the Special Funds for Major National Basic Research Projects,国家自然科学基金
文摘Evidence seems to show that coding DNA is more random than noncoding DNA, but other conflictingevidence also exists. Based on the third-base degeneracy of codons, we regard the third position of codons as a 'noisy'position. By deleting one fixed position of non-overlapping triplets in a given sequence, three masked sequences may bededuced from the sequence. We have investigated the block-to-site mutual information functions of coding and noncodingsequences in yeast without and with the masking. Characteristics that distinguish coding from noncoding DNA havebeen found. It is observed that the strong correlations in the coding regions may be blocked by the third base of codons,and the proper masking can extract the correlations. Distribution of dimeric tandem repeats of unmasked sequences isalso compared with that of masked sequences.