Computational gene structure prediction, which is valuable for finding new genes and understanding the composition of genomes, plays a very important role in various kinds of genome projects. For eukaryotic gene struc...Computational gene structure prediction, which is valuable for finding new genes and understanding the composition of genomes, plays a very important role in various kinds of genome projects. For eukaryotic gene structures, however, the prediction accuracy of existing methods is still limited. This paper presents a method of pre-dicting eukaryotic gene structures based on multilevel opti-mization. The complicated problem of predicting gene structure in eukaryotic DNA sequence containing multiple genes can be decomposed into a series of sub-problems at several levels with decreasing complexity, including the gene level (single-exon gene, multi-exon gene), the element level (exon, intron, etc.), and the feature level (functional site sig-nals, codon usage preference, etc.). On the basis of this de-composition, a multilevel model for the prediction of complex gene structures is created by a multilevel optimization proc-ess, in which the models dealing with sub-problems at low complexity level are first optimized respectively, and then optimally combined together to form models for those sub-problems at higher complexity level. Based on the multi-level model, a dynamic programming algorithm is designed to search for optimal gene structures from DNA sequences, and a new program GeneKey (1.0) for the prediction of eu-karyotic gene structures is developed. Testing results with widely used datasets demonstrate that the prediction accura-cies of GeneKey (1.0) at the nucleotide level, exon level and gene level are all higher than that of the well known program GENSCAN. A web server of GeneKey(1.0) is available at http://infosci.hust.edu.展开更多
文摘Computational gene structure prediction, which is valuable for finding new genes and understanding the composition of genomes, plays a very important role in various kinds of genome projects. For eukaryotic gene structures, however, the prediction accuracy of existing methods is still limited. This paper presents a method of pre-dicting eukaryotic gene structures based on multilevel opti-mization. The complicated problem of predicting gene structure in eukaryotic DNA sequence containing multiple genes can be decomposed into a series of sub-problems at several levels with decreasing complexity, including the gene level (single-exon gene, multi-exon gene), the element level (exon, intron, etc.), and the feature level (functional site sig-nals, codon usage preference, etc.). On the basis of this de-composition, a multilevel model for the prediction of complex gene structures is created by a multilevel optimization proc-ess, in which the models dealing with sub-problems at low complexity level are first optimized respectively, and then optimally combined together to form models for those sub-problems at higher complexity level. Based on the multi-level model, a dynamic programming algorithm is designed to search for optimal gene structures from DNA sequences, and a new program GeneKey (1.0) for the prediction of eu-karyotic gene structures is developed. Testing results with widely used datasets demonstrate that the prediction accura-cies of GeneKey (1.0) at the nucleotide level, exon level and gene level are all higher than that of the well known program GENSCAN. A web server of GeneKey(1.0) is available at http://infosci.hust.edu.