摘要
Genome data of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2)is essential for virus diagnosis,vaccine development,and variant surveillance.To archive and integrate worldwide SARS-CoV-2 genome data,a series of resources have been constructed,serving as a fundamental infrastructure for SARS-CoV-2 research,pandemic prevention and control,and coronavirus disease 2019(COVID-19)therapy.Here we present an over-view of extant SARS-CoV-2 resources that are devoted to genome data deposition and integration.We review deposition resources in data accessibility,metadata standardization,data curation and annotation;review integrative resources in data source,de-redundancy processing,data curation and quality assessment,and variant annotation.Moreover,we address issues that impede SARS-CoV-2 genome data integration,including low-complexity,inconsistency and absence of isolate name,sequence inconsistency,asynchronous update of genome data,and mismatched metadata.We finally provide insights into data standardization consensus and data submission guidelines,to promote SARS-CoV-2 genome data sharing and integration.
基金
supported by Strategic Priority Research Program of the Chinese Academy of Sciences[XDB38030201,XDB38030400,XDB38050300]
Youth Innovation Promotion Association of Chinese Academy of Sciences[2019104]。