摘要
常规的应用计算机软件注释基因存在缺陷,目前对基因组的准确注释依然是一项富有挑战性的任务本研究旨在应用蛋白质基因组学(proteogenomics)方法完善福氏志贺菌的基因组注释。提取福氏2a志贺菌301株(Sf2a301)的全菌蛋白,胰蛋白酶水解的肤混合物经二维液相色谱分离、在线ESI串联质谱分析,质谱数据检索Sf2a301的6个读码框数据库,鉴定结果进一步经过生物信息学分析和实验验证。本研究共验证了729个Sf2a301已注释基因的蛋白编码产物,鉴定蛋白在分子量、等电点和疏水性等理化性质方面的分布与Sf2a301基因组已注释蛋白的趋势一致。共发现了6个未注释的新基因,新基因得到了RT-PCR在转录水平上的进一步验证。蛋白质基因组学能够有效的完善志贺菌的基因组注释,不仅验证了已注释基因,而且能够发现新的基因补充其原有基因组注释库,这种策略有望被推广到其他经过测序的生物体基因组注释工作中。
Due to inherent defects of prediction for protein-coding genes by computational algorithm,accurate annotations of these genomes have been the bottle-neck of knowledge acquisition.Here we proposed a proteogenomic approach to improve the conventional genome annotation of S.flexneri.Bacterial proteins of S.flexneri 2a str.301(Sf2a301) were extracted and digested with trypsin,and resulting peptides were separated by using two-dimensional liquid chromatography,and subsequently analyzed by using on-line ESI tandem mass spectrometry.MS spectra searched all possible six read frame database generated from Sf2a301 and the MS results were verified by bioinformatics analysis and biology experiments.A total of 729 proteins of Sf2a301 were unambiguously identified in this research.Distribution patterns of MW,pI and GRAVY of identified proteins were similar to those of annotated proteins for Sf2a301.Above all,6 unannotated novel genes were discovered.Moreover,the transcripts of all novel genes were confirmed with RT-PCR assay.Our findings indicate that proteogenomic analysis is quite qualified for accurate genome wide annotation of S.flexneri,including validation of predicted genes and discovery of novel genes.Proteogenomics strategy would be taken as a routine work in the genome annotation process of other sequenced organisms.
出处
《基因组学与应用生物学》
CAS
CSCD
北大核心
2016年第6期1437-1442,共6页
Genomics and Applied Biology
基金
国家自然科学基金项目(81302323)
河北省高等学校科学技术研究项目(QN20131059)
河北省自然科学基金(2013209194
2014209140)
华北理工大学培育基金项目(GP201518)和华北理工大学博士科研启动基金项目共同资助