Although a great deal of research has been undertaken in the area of the annotation of gene structure, predictive techniques are still not fully developed. In this paper, based on the characteristics of base compositi...Although a great deal of research has been undertaken in the area of the annotation of gene structure, predictive techniques are still not fully developed. In this paper, based on the characteristics of base composition of sequences and conservative of nucleotides at exon/intron splicing site, a least increment of diversity al-gorithm (LIDA) is developed for studying and predicting three kinds of coding exons, introns and intergenic regions. At first, by selecting the 64 trinucleotides composition and 120 position parameters of the four bases as informational parameters, coding exon, intron and intergenic sequence are predicted. The results show that overall predicted accuracies are 91.1% and 88.4%, respectively for A. thaliana and C. ele-gans genome. Subsequently, based on the po-sition frequencies of four kinds of bases in regions near intron/coding exon boundary, initia-tion and termination site of translation, 12 position parameters are selected as diversity source. And three kinds of the coding exons are predicted by use of the LIDA. The predicted successful rates are higher than 80%. These results can be used in sequence annotation.展开更多
文摘Although a great deal of research has been undertaken in the area of the annotation of gene structure, predictive techniques are still not fully developed. In this paper, based on the characteristics of base composition of sequences and conservative of nucleotides at exon/intron splicing site, a least increment of diversity al-gorithm (LIDA) is developed for studying and predicting three kinds of coding exons, introns and intergenic regions. At first, by selecting the 64 trinucleotides composition and 120 position parameters of the four bases as informational parameters, coding exon, intron and intergenic sequence are predicted. The results show that overall predicted accuracies are 91.1% and 88.4%, respectively for A. thaliana and C. ele-gans genome. Subsequently, based on the po-sition frequencies of four kinds of bases in regions near intron/coding exon boundary, initia-tion and termination site of translation, 12 position parameters are selected as diversity source. And three kinds of the coding exons are predicted by use of the LIDA. The predicted successful rates are higher than 80%. These results can be used in sequence annotation.