摘要
【目的】识别并修正由断裂的标记基因引起的来自宏基因组测序组装的基因组污染度的高估。【方法】利用纯菌完整基因组构造的模拟数据来分析断裂基因对基因组质量评估的影响以及设定矫正参数,基于nr库的分类学注释结果来判定2个断裂标记基因(即断裂基因对)是否来自于同一标记基因,在剔除断裂冗余基因后重新计算污染度。【结果】基于纯菌完整基因组模拟打断数据的结果表明基因组片段化程度越高,基因组的污染度越高,并且该现象在分箱获得的微生物基因组草图中也有体现。我们设计的矫正流程能将纯菌模拟打断数据的污染度纠正到完整基因组的水平。在对760个肠道和土壤宏基因组来源的污染度大于0的基因组草图进行矫正后,接近半数基因组的污染度降低,其中43个基因组的污染度降至0。【结论】我们的流程可以在一定程度上矫正由断裂基因引起的基因组污染度的高估,提高分箱基因组草图的可利用率,并可应用于需求日益增加的宏基因组来源的基因组质量评估中。
[Objective]Identifying and correcting the overestimation on contamination of metagenome-assembly genomes(MAGs)caused by the broken marker genes.[Methods]The impact of broken genes on quality assessment of genome was first analyzed using the simulated genomes from randomly fragmented the complete genome of isolates.We designed a corrected pipeline that identifying the broken genes pairs from the same“source”gene according to the taxonomic annotation against the nr database.Then the genome contamination was corrected by removing the redundant marker genes.[Results]The phenomenon that the genome contamination is positively correlated with the genome fragmentation degree was observed in both simulated genomes and MAGs obtained by genome binning.We designed a corrected pipeline based on the idea of identifying broken genes from the same“source”gene and the results based on the simulated genomes showed the contamination can be adjusted to complete genome level.Testing on 760 MAGs with contamination from gut and soil samples,we observed a reduction in contamination for nearly half of the MAGs,with 43 of them dropping to 0.[Conclusion]Our pipeline can correct the overestimated contamination of genome caused by broken genes to some extent and improve the availability of MAGs.The pipeline is expected to apply to the genome quality assessment of the increasing number of MAGs.
作者
李浩
杨东旭
温林冉
郑伟
郭峰
Hao Li;Dongxu Yang;Linran Wen;Wei Zheng;Feng Guo(School of Life Sciences,Xiamen University,Xiamen 361102,Fujian Province,China;Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai),Zhuhai 510275,Guangdong Province,China;Key Laboratory of Microbial Resource(Fujian),Xiamen 361102,Fujian Province,China)
出处
《微生物学报》
CAS
CSCD
北大核心
2021年第9期2921-2933,共13页
Acta Microbiologica Sinica
基金
国家自然科学基金(31670492,31500100)。