期刊文献+

利用蛋白质序列模式识别改善谷氨酸棒杆菌基因组注释

Improvement of genome annotation of Corynebacterium glutamicum by using protein signature
下载PDF
导出
摘要 即使细菌基因组的基因结构较为简单,但在注释过程中也可能出现基因遗漏的现象。当潜在基因在高质量数据库中没有显著同源序列时,基于知识库的基因预测方法就会遇到困难。本文希望通过系统扫描基因组所有可能ORF的蛋白质序列模式来搜索遗漏基因。为验证该方法的可行性,作者系统分析了重要的工业发酵微生物谷氨酸棒杆菌的基因组,发现了25个候选疑似基因。它们具有显著的蛋白质序列模式,但在Swiss-Prot中元显著同源序列,并且在GenBank中仍未注释。深入分析发现,25个候选疑似基因中19个为可能基因,3个为可能假基因,3个为疑似基因序列。这些结果说明本文的分析方法可以有效地用于无显著同源序列基因的搜索。 Genes may be missed in annotation of genomes, even for bacteria with the simplest gene structures. Knowledge based on approaches encountered difficulties when potential genes had no significant homolognes in well-curated databases. In this work, a new method to find missing genes through systematic scan of protein sequence signatures in all possible open reading frames (ORFs) was proposed. For concept proof, the genome of Corynebacterium glutamicum, a highly interesting bacterium widely used in industry, was investigated, and finally 25 signature-carrying ORFs, with no homologues in Swiss-Prot were found that were not annotated in GenBank database. Further analyses of these ORFs showed that 19 of them had additional supportive evidences to be genes, other 3 likely pseudogenes, and the other 3 gene-like sequences. The results demonstrated the efficacy of the proposed method to identify genes with no obvious known homologues.
出处 《工业微生物》 CAS CSCD 2014年第3期70-76,共7页 Industrial Microbiology
关键词 蛋白质序列模式 谷氨酸棒杆菌 基因组注释 protein signature Corynebacterium glutamicum genome annotation
  • 相关文献

参考文献29

  • 1Liolios K, Chen I-M A, Mavromatis K, et al. The C, enomes On Line Database (GOLD) in 2009: status of genomic and met- agenomic projects and their associated metadata. Nucleic Acids Res, 2010, 38(Database issue) : D346-354.
  • 2Nielsen P and Krogh A. Large-scale prokaryotic gene prediction and comparison to genome annotation. Bioinformatics (Oxford, England), 2005, 21 (24) : 4322-.4329.
  • 3Delcher AL, Bratke KA, Powers EC, et al. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics ( Ox- ford, England), 2007, 23(6) : 673-679.
  • 4Stothard P, Wishart DS. Automated bacterial genome analysis and annotation. Curr Opin Mierobiol, 2006, 9 (5) : 505-510.
  • 5Reeves GA, Talavera D, Thornton JM. C, enome and proteome an- notation: organization, interpretation and integration. J R Soc In- terface, 2009, 6 (31 ) : 129-147.
  • 6Do JH, Choi DK. Computational approaches to gene prediction. J Microbiol 2006, 44(2) : 137-144.
  • 7Boeckmann B, Bairoch A, Apweiler R, et od. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nu- cleic Acids Res, 2003, 31 (1) : 365-70.
  • 8Lain HYK, Khurana E, Fang G, et al. Pseudofam: the pseudo- gene families database. Nucleic Acids Res, 2009, 37 ( Database issue) : D738-D743.
  • 9Zhou J, Rudd KE. EcoC, ene 3.0. Nucleic Acids Res, 2013, 41 ( Database issue) : D613-D624.
  • 10Harrison PM, Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Bi- ol, 2002, 318(5): 1155-1174.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部