DEAR EDITOR,Analysis of SARS-CoV-2 genome variation using a minimal number of selected informative sites conforming a genetic barcode presents several drawbacks.We show that purely mathematical procedures for site sel...DEAR EDITOR,Analysis of SARS-CoV-2 genome variation using a minimal number of selected informative sites conforming a genetic barcode presents several drawbacks.We show that purely mathematical procedures for site selection should be supervised by known phylogeny(i)to ensure that solid tree branches are represented instead of mutational hotspots with poor phylogeographic proprieties,and(ii)to avoid phylogenetic redundancy.We propose a procedure that prevents information redundancy in site selection by considering the cumulative informativeness of previously selected sites(as a proxy for phylogenetic-based criteria).This procedure demonstrates that,for short barcodes(e.g.,11 sites),there are thousands of informative site combinations that improve previous proposals.We also show that barcodes based on worldwide databases inevitably prioritize variants located at the basal nodes of the phylogeny,such that most representative genomes in these ancestral nodes are no longer in circulation.Consequently,coronavirus phylodynamics cannot be properly captured by universal genomic barcodes because most SARS-CoV-2 variation is generated in geographically restricted areas by the continuous introduction of domestic variants.展开更多
Research on biogeographical ancestry(BGA)is becoming of growing interest in forensic genetics and in the biomedical literature(1)Thus,for instance,the need to predict ethnicity of an unknown suspect based on DNA profi...Research on biogeographical ancestry(BGA)is becoming of growing interest in forensic genetics and in the biomedical literature(1)Thus,for instance,the need to predict ethnicity of an unknown suspect based on DNA profiles found at the crime scene is of maximum interest in criminalistics[2],and several autosomal SNP panels have been designed and tested for BGA investigations[3,4].Most of these panels aim at discriminating three main continental groups(sub-Saharan Africans,Europeans,and Asians)by way of testing a number of ancestry informative markers(AIMs)that run from a few dozens to a few hundred[5](see more background in Supplementary data online).展开更多
基金This study was supported by the GePEM(Instituto de Salud Carlos III(ISCIII)/PI16/01478/Cofinanciado FEDER)DIAVIR(Instituto de Salud Carlos III(ISCIII)/DTS19/00049/Cofinanciado FEDER+7 种基金Proyecto de Desarrollo Tecnológico en Salud),Resvi-Omics(Instituto de Salud Carlos III(ISCIII)/PI19/01039/Cofinanciado FEDER),BI-BACVIR(PRIS-3Agencia de Conocimiento en Salud(ACIS)-Servicio Gallego de Salud(SERGAS)-Xunta de GaliciaSpain),Programa Traslaciona Covid-19(ACIS-Servicio Gallego de Salud(SERGAS)-Xunta de GaliciaSpain)and Axencia Galega de Innovación(GAININ607B 2020/08-Xunta de GaliciaSpain)to A.S.and ReSVinext(Instituto de Salud Carlos III(ISCIII)/PI16/01569/Cofinanciado FEDER),and Enterogen(Instituto de Salud Carlos III(ISCIII)/PI19/01090/Cofinanciado FEDER)to F.M.-TWe gratefully acknowledge GISAID and contributing laboratories(Supplementary Table S1)for giving us access to the SAR-CoV-2 genomes used in the present study.
文摘DEAR EDITOR,Analysis of SARS-CoV-2 genome variation using a minimal number of selected informative sites conforming a genetic barcode presents several drawbacks.We show that purely mathematical procedures for site selection should be supervised by known phylogeny(i)to ensure that solid tree branches are represented instead of mutational hotspots with poor phylogeographic proprieties,and(ii)to avoid phylogenetic redundancy.We propose a procedure that prevents information redundancy in site selection by considering the cumulative informativeness of previously selected sites(as a proxy for phylogenetic-based criteria).This procedure demonstrates that,for short barcodes(e.g.,11 sites),there are thousands of informative site combinations that improve previous proposals.We also show that barcodes based on worldwide databases inevitably prioritize variants located at the basal nodes of the phylogeny,such that most representative genomes in these ancestral nodes are no longer in circulation.Consequently,coronavirus phylodynamics cannot be properly captured by universal genomic barcodes because most SARS-CoV-2 variation is generated in geographically restricted areas by the continuous introduction of domestic variants.
基金support from the project Ge PEM ISCIII/PI16/01478/Cofinanciado FEDER of the Instituto de Salud Carlos IIIsupport from project Re SVinext ISCIII/PI16/01569/Cofinanciado FEDER
文摘Research on biogeographical ancestry(BGA)is becoming of growing interest in forensic genetics and in the biomedical literature(1)Thus,for instance,the need to predict ethnicity of an unknown suspect based on DNA profiles found at the crime scene is of maximum interest in criminalistics[2],and several autosomal SNP panels have been designed and tested for BGA investigations[3,4].Most of these panels aim at discriminating three main continental groups(sub-Saharan Africans,Europeans,and Asians)by way of testing a number of ancestry informative markers(AIMs)that run from a few dozens to a few hundred[5](see more background in Supplementary data online).