利用蛋白质序列模式识别改善谷氨酸棒杆菌基因组注释

Improvement of genome annotation of Corynebacterium glutamicum by using protein signature

下载PDF

导出

摘要即使细菌基因组的基因结构较为简单,但在注释过程中也可能出现基因遗漏的现象。当潜在基因在高质量数据库中没有显著同源序列时,基于知识库的基因预测方法就会遇到困难。本文希望通过系统扫描基因组所有可能ORF的蛋白质序列模式来搜索遗漏基因。为验证该方法的可行性,作者系统分析了重要的工业发酵微生物谷氨酸棒杆菌的基因组,发现了25个候选疑似基因。它们具有显著的蛋白质序列模式,但在Swiss-Prot中元显著同源序列,并且在GenBank中仍未注释。深入分析发现,25个候选疑似基因中19个为可能基因,3个为可能假基因,3个为疑似基因序列。这些结果说明本文的分析方法可以有效地用于无显著同源序列基因的搜索。 Genes may be missed in annotation of genomes, even for bacteria with the simplest gene structures. Knowledge based on approaches encountered difficulties when potential genes had no significant homolognes in well-curated databases. In this work, a new method to find missing genes through systematic scan of protein sequence signatures in all possible open reading frames （ORFs） was proposed. For concept proof, the genome of Corynebacterium glutamicum, a highly interesting bacterium widely used in industry, was investigated, and finally 25 signature-carrying ORFs, with no homologues in Swiss-Prot were found that were not annotated in GenBank database. Further analyses of these ORFs showed that 19 of them had additional supportive evidences to be genes, other 3 likely pseudogenes, and the other 3 gene-like sequences. The results demonstrated the efficacy of the proposed method to identify genes with no obvious known homologues.

作者周大为李炜疆

机构地区江南大学工业生物技术教育部重点实验室江南大学生物工程学院

出处《工业微生物》 CAS CSCD 2014年第3期70-76,共7页 Industrial Microbiology

关键词蛋白质序列模式谷氨酸棒杆菌基因组注释 protein signature Corynebacterium glutamicum genome annotation

分类号 Q78 [生物学—分子生物学]

引文网络
相关文献

参考文献29

1Liolios K, Chen I-M A, Mavromatis K, et al. The C, enomes On Line Database (GOLD) in 2009: status of genomic and met- agenomic projects and their associated metadata. Nucleic Acids Res, 2010, 38(Database issue) : D346-354.
2Nielsen P and Krogh A. Large-scale prokaryotic gene prediction and comparison to genome annotation. Bioinformatics (Oxford, England), 2005, 21 (24) : 4322-.4329.
3Delcher AL, Bratke KA, Powers EC, et al. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics ( Ox- ford, England), 2007, 23(6) : 673-679.
4Stothard P, Wishart DS. Automated bacterial genome analysis and annotation. Curr Opin Mierobiol, 2006, 9 (5) : 505-510.
5Reeves GA, Talavera D, Thornton JM. C, enome and proteome an- notation: organization, interpretation and integration. J R Soc In- terface, 2009, 6 (31 ) : 129-147.
6Do JH, Choi DK. Computational approaches to gene prediction. J Microbiol 2006, 44(2) : 137-144.
7Boeckmann B, Bairoch A, Apweiler R, et od. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nu- cleic Acids Res, 2003, 31 (1) : 365-70.
8Lain HYK, Khurana E, Fang G, et al. Pseudofam: the pseudo- gene families database. Nucleic Acids Res, 2009, 37 ( Database issue) : D738-D743.
9Zhou J, Rudd KE. EcoC, ene 3.0. Nucleic Acids Res, 2013, 41 ( Database issue) : D613-D624.
10Harrison PM, Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Bi- ol, 2002, 318(5): 1155-1174.

1思达.升级你的母乳喂养知识库[J].父母世界,2013(6):83-83.
2周文婷,胡扬.几种可能影响耐力素质的潜在基因[J].遗传,2010,32(12):1215-1222. 被引量：8
3杨维平.基因芯片技术及其在医学研究领域中的应用[J].实用临床医药杂志,2003,7(4):382-385. 被引量：1
4周波,黄瑞华,刘红林,王林云.SLA-DQB和DRB的生物信息学分析[J].生物信息学,2008,6(2):52-54. 被引量：2
5一帆.快乐母乳喂养实用教程[J].父母世界,2009(8):79-86.
6杨炳艳,刘云婷,胡文靖,么大轩,段会军.基于cDNA-AFLP及MSAP技术分析西瓜同源二倍体和四倍体低温胁迫差异表达[J].植物遗传资源学报,2015,16(6):1298-1306. 被引量：3
7张田勘.人类基因和果蝇基因的富矿可能遗漏?[J].广东科技,2001,10(12):51-52.
8蔡绍芬.培养小学生的写作兴趣[J].教育界（综合教育）,2016(4):55-55.
9杜春荣.小学体育教学如何培养学生抗挫折的能力[J].学生之友（小学版）,2011(17):67-67. 被引量：3
10朱新华.大象与啄木鸟[J].解放军生活,2015,0(4):81-81.

工业微生物

2014年第3期

浏览历史

内容加载中请稍等...

利用蛋白质序列模式识别改善谷氨酸棒杆菌基因组注释

参考文献29

相关作者

相关机构

相关主题

浏览历史