摘要
即使细菌基因组的基因结构较为简单,但在注释过程中也可能出现基因遗漏的现象。当潜在基因在高质量数据库中没有显著同源序列时,基于知识库的基因预测方法就会遇到困难。本文希望通过系统扫描基因组所有可能ORF的蛋白质序列模式来搜索遗漏基因。为验证该方法的可行性,作者系统分析了重要的工业发酵微生物谷氨酸棒杆菌的基因组,发现了25个候选疑似基因。它们具有显著的蛋白质序列模式,但在Swiss-Prot中元显著同源序列,并且在GenBank中仍未注释。深入分析发现,25个候选疑似基因中19个为可能基因,3个为可能假基因,3个为疑似基因序列。这些结果说明本文的分析方法可以有效地用于无显著同源序列基因的搜索。
Genes may be missed in annotation of genomes, even for bacteria with the simplest gene structures. Knowledge based on approaches encountered difficulties when potential genes had no significant homolognes in well-curated databases. In this work, a new method to find missing genes through systematic scan of protein sequence signatures in all possible open reading frames (ORFs) was proposed. For concept proof, the genome of Corynebacterium glutamicum, a highly interesting bacterium widely used in industry, was investigated, and finally 25 signature-carrying ORFs, with no homologues in Swiss-Prot were found that were not annotated in GenBank database. Further analyses of these ORFs showed that 19 of them had additional supportive evidences to be genes, other 3 likely pseudogenes, and the other 3 gene-like sequences. The results demonstrated the efficacy of the proposed method to identify genes with no obvious known homologues.
出处
《工业微生物》
CAS
CSCD
2014年第3期70-76,共7页
Industrial Microbiology