摘要
Introduction:Salmonella is a key intestinal pathogen of foodborne disease,and the plasmids in Salmonella are related to many biological characteristics,including virulence and drug resistance.A large number of plasmid contigs have been sequenced in bacterial draft genomes,however,these are often difficult to distinguish from chromosomal contigs.Methods:In this study,three different customized Kraken databases were used to build three different Kraken classifiers.Complete genome benchmark datasets and simulated draft genome benchmark datasets were constructed.Five-fold cross-validation was used to evaluate the performance of the three different Kraken classifiers by two benchmark datasets.Results:The predictive performance of the classifier based on all National Center for Biotechnology Information plasmids and Salmonella complete genomes was optimal.This optimal Kraken classifier was performed with Salmonella isolated in China.The plasmid carrying rate of Salmonella in China is 91.01%,and it was found that the Kraken classifier could find more plasmid contigs and antibiotic resistance genes(ARGs)than results derived from a plasmid replicon-based method(PlasmidFinder).Moreover,it was found that in the strains carrying ARGs,plasmids carried more ARGs[three,95%confidence interval(CI):1–14]than chromosomes(one,95%CI:1–7).Discussion:We found building a high-quality customized database as a Kraken classifier to be ideal for the prediction of Salmonella plasmid sequences from bacterial draft genomes.In the future,the Kraken classifier established in this study will play a significant role in ARG monitoring.
基金
Supported by the National Key Research and Development Program of China(2020YFE 0205700,2022YFC2303900)
the major projects of the National Natural Science Foundation of China(22193064)
the Science Foundation(2022SKLID303)of the State Key Laboratory of Infectious Disease Prevention and Control,China.