生命与健康多组学数据是生命科学研究和生物医学技术发展的重要基础。然而,我国缺乏生物数据管理和共享平台,不但无法满足国内日益增长的生物医学及相关学科领域的研究发展需求,而且严重制约我国生物大数据整合共享与转化利用。鉴于此,...生命与健康多组学数据是生命科学研究和生物医学技术发展的重要基础。然而,我国缺乏生物数据管理和共享平台,不但无法满足国内日益增长的生物医学及相关学科领域的研究发展需求,而且严重制约我国生物大数据整合共享与转化利用。鉴于此,中国科学院北京基因组研究所于2016年初成立生命与健康大数据中心(BIG Data Center,BIGD),围绕国家人口健康和重要战略生物资源,建立生物大数据管理平台和多组学数据资源体系。本文重点介绍BIGD的生命与健康大数据资源系统,主要包括组学原始数据归档库、基因组数据库、基因组变异数据库、基因表达数据库、甲基化数据库、生物信息工具库和生命科学维基知识库,提供生物大数据汇交、整合与共享服务,为促进我国生命科学数据管理、推动国家生物信息中心建设奠定重要基础。展开更多
On January 22,2020,China National Center for Bioinformation(CNCB)released the 2019 Novel Coronavirus Resource(2019nCoVR),an open-access information resource for the severe acute respiratory syndrome coronavirus 2(SARS...On January 22,2020,China National Center for Bioinformation(CNCB)released the 2019 Novel Coronavirus Resource(2019nCoVR),an open-access information resource for the severe acute respiratory syndrome coronavirus 2(SARS-CoV-2).2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates,which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline.Of particular note,2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale.It provides all identified variants and their detailed statistics for each virus isolate,and congregates the quality score,functional annotation,and population frequency for each variant.Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available.Moreover,2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019(COVID-19),including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC.Furthermore,by linking with relevant databases in CNCB,2019nCoVR offers data submission services for raw sequence reads and assembled genomes,and data sharing with NCBI.Collectively,SARS-CoV-2 is updated daily to collect the latest information on genome sequences,variants,haplotypes,and literature for a timely reflection,making 2019nCoVR a valuable resource for the global research community.2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.展开更多
COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a staggering pandemic in a few months,and a global fight against both has been intensifying.Here,we describe an analysis procedure where genome...COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a staggering pandemic in a few months,and a global fight against both has been intensifying.Here,we describe an analysis procedure where genome composition and its variables are related,through the genetic code to molecular mechanisms,based on understanding of RNA replication and its feedback loop from mutation to viral proteome sequence fraternity including effective sites on the replicase-transcriptase complex.Our analysis starts with primary sequence information,identity-based phylogeny based on 22,051 SARS-CoV-2 sequences,and evaluation of sequence variation patterns as mutation spectra and its 12 permutations among organized clades.All are tailored to two key mechanisms:strand-biased and function-associated mutations.Our findings are listed as follows:1)The most dominant mutation is C-to-U permutation,whose abundant second-codon-position counts alter amino acid composition toward higher molecular weight and lower hydrophobicity,albeit assumed most slightly deleterious.2)The second abundance group includes three negative-strand mutations(U-to-C,A-to-G,and G-to-A)and a positive-strand mutation(G-to-U)due to DNA repair mechanisms after cellular abasic events.3)A clade-associated biased mutation trend is found attributable to elevated level of negative-sense strand synthesis.4)Within-clade permutation variation is very informative for associating non-synonymous mutations and viral proteome changes.These findings demand a platform where emerging mutations are mapped onto mostly subtle but fast-adjusting viral proteomes and transcriptomes,to provide biological and clinical information after logical convergence for effective pharmaceutical and diagnostic applications.Such actions are in desperate need,especially in the middle of the War against COVID-19.展开更多
文摘生命与健康多组学数据是生命科学研究和生物医学技术发展的重要基础。然而,我国缺乏生物数据管理和共享平台,不但无法满足国内日益增长的生物医学及相关学科领域的研究发展需求,而且严重制约我国生物大数据整合共享与转化利用。鉴于此,中国科学院北京基因组研究所于2016年初成立生命与健康大数据中心(BIG Data Center,BIGD),围绕国家人口健康和重要战略生物资源,建立生物大数据管理平台和多组学数据资源体系。本文重点介绍BIGD的生命与健康大数据资源系统,主要包括组学原始数据归档库、基因组数据库、基因组变异数据库、基因表达数据库、甲基化数据库、生物信息工具库和生命科学维基知识库,提供生物大数据汇交、整合与共享服务,为促进我国生命科学数据管理、推动国家生物信息中心建设奠定重要基础。
基金This work was supported by grants from the Strategic PriorityResearch Program of Chinese Academy of Sciences(GrantNos.XDA19090116,XDA19050302,and XDB38030400)awarded to SS,ZZ,and MLthe National Key R&D Programof China(Grant Nos.2020YFC0848900,2020YFC0847000,2016YFE0206600,and 2017YFC0907502)+5 种基金the 13th Five-yearInformatization Plan of Chinese Academy of Sciences(GrantNo.XXH13505-05)Genomics Data Center Construction ofChinese Academy of Sciences(Grant No.XXH-13514-0202)the Open Biodiversity and Health Big Data Programme ofInternational Union of Biological Sciences,International Part-nership Program of Chinese Academy of Sciences(Grant No.153F11KYSB20160008)the Professional Association of theAlliance of International Science Organizations(Grant No.ANSO-PA-2020-07)This work was also supported by KCWong Education Foundation to ZZthe YouthInnovation Promotion Association of Chinese Academy ofSciences(Grant Nos.2017141 and 2019104)awarded to SSand ML.
文摘On January 22,2020,China National Center for Bioinformation(CNCB)released the 2019 Novel Coronavirus Resource(2019nCoVR),an open-access information resource for the severe acute respiratory syndrome coronavirus 2(SARS-CoV-2).2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates,which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline.Of particular note,2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale.It provides all identified variants and their detailed statistics for each virus isolate,and congregates the quality score,functional annotation,and population frequency for each variant.Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available.Moreover,2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019(COVID-19),including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC.Furthermore,by linking with relevant databases in CNCB,2019nCoVR offers data submission services for raw sequence reads and assembled genomes,and data sharing with NCBI.Collectively,SARS-CoV-2 is updated daily to collect the latest information on genome sequences,variants,haplotypes,and literature for a timely reflection,making 2019nCoVR a valuable resource for the global research community.2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.
基金This work was supported by grants from The Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDA19090116 to SS,Grant No.XDA19050302 to ZZ)National Key R&D Program of China(Grant Nos.2020YFC0848900 and 2017YFC0907502)+4 种基金13th Five-year Informatization Plan of Chinese Academy of Sciences(Grant No.XXH13505-05)K.C.Wong Education Foundation to ZZ,and International Partnership Program of the Chinese Academy of Sciences(Grant No.153F11KYSB20160008)The Youth Innovation Promotion Association of Chinese Academy of Science(Grant No.2017141 to SS)National Natural Science Foundation of China(Grant No.31671350 to JY)the Key Research Program of Frontier Sciences,Chinese Academy of Sciences(Grant No.QYZDY-SSW-SMC017 to JY).
文摘COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a staggering pandemic in a few months,and a global fight against both has been intensifying.Here,we describe an analysis procedure where genome composition and its variables are related,through the genetic code to molecular mechanisms,based on understanding of RNA replication and its feedback loop from mutation to viral proteome sequence fraternity including effective sites on the replicase-transcriptase complex.Our analysis starts with primary sequence information,identity-based phylogeny based on 22,051 SARS-CoV-2 sequences,and evaluation of sequence variation patterns as mutation spectra and its 12 permutations among organized clades.All are tailored to two key mechanisms:strand-biased and function-associated mutations.Our findings are listed as follows:1)The most dominant mutation is C-to-U permutation,whose abundant second-codon-position counts alter amino acid composition toward higher molecular weight and lower hydrophobicity,albeit assumed most slightly deleterious.2)The second abundance group includes three negative-strand mutations(U-to-C,A-to-G,and G-to-A)and a positive-strand mutation(G-to-U)due to DNA repair mechanisms after cellular abasic events.3)A clade-associated biased mutation trend is found attributable to elevated level of negative-sense strand synthesis.4)Within-clade permutation variation is very informative for associating non-synonymous mutations and viral proteome changes.These findings demand a platform where emerging mutations are mapped onto mostly subtle but fast-adjusting viral proteomes and transcriptomes,to provide biological and clinical information after logical convergence for effective pharmaceutical and diagnostic applications.Such actions are in desperate need,especially in the middle of the War against COVID-19.