期刊文献+

组装断裂导致宏基因组来源的基因组污染度高估的评估与修正

Marker gene broken caused overestimation on the contamination of metagenome-assembled genomes and its correction
原文传递
导出
摘要 【目的】识别并修正由断裂的标记基因引起的来自宏基因组测序组装的基因组污染度的高估。【方法】利用纯菌完整基因组构造的模拟数据来分析断裂基因对基因组质量评估的影响以及设定矫正参数,基于nr库的分类学注释结果来判定2个断裂标记基因(即断裂基因对)是否来自于同一标记基因,在剔除断裂冗余基因后重新计算污染度。【结果】基于纯菌完整基因组模拟打断数据的结果表明基因组片段化程度越高,基因组的污染度越高,并且该现象在分箱获得的微生物基因组草图中也有体现。我们设计的矫正流程能将纯菌模拟打断数据的污染度纠正到完整基因组的水平。在对760个肠道和土壤宏基因组来源的污染度大于0的基因组草图进行矫正后,接近半数基因组的污染度降低,其中43个基因组的污染度降至0。【结论】我们的流程可以在一定程度上矫正由断裂基因引起的基因组污染度的高估,提高分箱基因组草图的可利用率,并可应用于需求日益增加的宏基因组来源的基因组质量评估中。 [Objective]Identifying and correcting the overestimation on contamination of metagenome-assembly genomes(MAGs)caused by the broken marker genes.[Methods]The impact of broken genes on quality assessment of genome was first analyzed using the simulated genomes from randomly fragmented the complete genome of isolates.We designed a corrected pipeline that identifying the broken genes pairs from the same“source”gene according to the taxonomic annotation against the nr database.Then the genome contamination was corrected by removing the redundant marker genes.[Results]The phenomenon that the genome contamination is positively correlated with the genome fragmentation degree was observed in both simulated genomes and MAGs obtained by genome binning.We designed a corrected pipeline based on the idea of identifying broken genes from the same“source”gene and the results based on the simulated genomes showed the contamination can be adjusted to complete genome level.Testing on 760 MAGs with contamination from gut and soil samples,we observed a reduction in contamination for nearly half of the MAGs,with 43 of them dropping to 0.[Conclusion]Our pipeline can correct the overestimated contamination of genome caused by broken genes to some extent and improve the availability of MAGs.The pipeline is expected to apply to the genome quality assessment of the increasing number of MAGs.
作者 李浩 杨东旭 温林冉 郑伟 郭峰 Hao Li;Dongxu Yang;Linran Wen;Wei Zheng;Feng Guo(School of Life Sciences,Xiamen University,Xiamen 361102,Fujian Province,China;Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai),Zhuhai 510275,Guangdong Province,China;Key Laboratory of Microbial Resource(Fujian),Xiamen 361102,Fujian Province,China)
出处 《微生物学报》 CAS CSCD 北大核心 2021年第9期2921-2933,共13页 Acta Microbiologica Sinica
基金 国家自然科学基金(31670492,31500100)。
关键词 宏基因组组装基因组 基因组质量 CheckM 污染度 metagenome assembled genome genome quality CheckM contamination
  • 相关文献

参考文献5

二级参考文献61

  • 1SCHUSTER S C. Next-generation sequencing transforms today' s biology[J].Nature Methods, 2008, 5(1): 16-18.
  • 2SANGER F, NICKLEN S, COULSON A R. DNA sequencing with chain-terminating inhibitors[J]. Proceeding of the National Academy of Sciences, 1977, B7(12): 5463-5467.
  • 3SHENDURE J, JI H. Next-generation DNA sequencing[J]. Nature Biotechnology, 2008, 26(10): 1135-1145.
  • 4HIGGINS G. Human Genomes and Big Data Challenges[R]. Mason: AssureRx Health Inc, 2013.
  • 5WARD R M, SCHMIEDER R, HIGHNAM G, et al. Big data challenges and opportunities in highthrough-put sequencing[J]. Systems Biomedicine, 2013, 1(1): 29-34.
  • 6DUNHAM I, BIRNEY E, LAJOIE B R, et al. An integrated encyclopedia of DNA elements in the human genome[J]. Nature, 2012, 489(7414): 57-74.
  • 7COLLINS F S, BARKER A D. Mapping the cancer genome[J]. Scientific American, 2007, 296(3): 50-57.
  • 8HAYDEN E C. International genome project launched[J]. Nature, 2008, 451(7177): 378-389.
  • 9GEVERS D, KNIGHT R, PBTROSINO J F, et al. The human microbiome project: a community resource for the healthy human microbiome[J]. PLoS Biology, 2012, 10(8): e1001377.
  • 10HAUSSLER D, O' BRIEN S J, RYDER O A, et al. Genome 10K: a proposal to obtain whole-genome sequence for 10 000 vertebrate species[J]. The Journal of Heredity, 2008, 100(6): 659-674.

共引文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部