Time is an essential reference system for recording objects,events,and processes in the field of geosciences.There are currently various time references,such as solar calendar,geological time,and regional calendar,to ...Time is an essential reference system for recording objects,events,and processes in the field of geosciences.There are currently various time references,such as solar calendar,geological time,and regional calendar,to represent the knowledge in different domains and regions,which subsequently entails a time conversion process required to interpret temporal information under different time references.However,the current time conversion method is limited by the application scope of existing time ontologies(e.g.,“Jurassic”is a period in geological ontology,but a point value in calendar ontology)and the reliance on experience in conversion processes.These issues restrict accurate and efficient calculation of temporal information across different time references.To address these issues,this paper proposes a Unified Time Framework(UTF)in the geosciences knowledge system.According to a systematic time element parsing from massive time references,the proposed UTF designs an independent time root node to get rid of irrelevant nodes when accessing different time types and to adapt to the time expression of different geoscience disciplines.Furthermore,this UTF carries out several designs:to ensure the accuracy of time expressions by designing quantitative relationship definitions;to enable time calculations across different time elements by designing unified time nodes and structures,and to link to the required external ontologies by designing adequate interfaces.By comparing the time conversion methods,the experiment proves the UTF greatly supports accurate and efficient calculation of temporal information across different time references in SPARQL queries.Moreover,it shows a higher and more stable performance of temporal information queries than the time conversion method.With the advent of the Big Data era in the geosciences,the UTF can be used more widely to discover new geosciences knowledge across different time references.展开更多
Geoscience knowledge graph(GKG)can organize various geoscience knowledge into a machine understandable and computable semantic network and is an effective way to organize geoscience knowledge and provide knowledge-rel...Geoscience knowledge graph(GKG)can organize various geoscience knowledge into a machine understandable and computable semantic network and is an effective way to organize geoscience knowledge and provide knowledge-related services.As a result,it has gained significant attention and become a frontier in geoscience.Geoscience knowledge is derived from many disciplines and has complex spatiotemporal features and relationships of multiple scales,granularities,and dimensions.Therefore,establishing a GKG representation model conforming to the characteristics of geoscience knowledge is the basis and premise for the construction and application of GKG.However,existing knowledge graph representation models leverage fixed tuples that are limited in fully representing complex spatiotemporal features and relationships.To address this issue,this paper first systematically analyzes the categorization and spatiotemporal features and relationships of geoscience knowledge.On this basis,an adaptive representation model for GKG is proposed by considering the complex spatiotemporal features and relationships.Under the constraint of a unified spatiotemporal ontology,this model adopts different tuples to adaptively represent different types of geoscience knowledge according to their spatiotemporal correlation.This model can efficiently represent geoscience knowledge,thereby avoiding the isolation of the spatiotemporal feature representation and improving the accuracy and efficiency of geoscience knowledge retrieval.It can further enable the alignment,transformation,computation,and reasoning of spatiotemporal information through a spatiotemporal ontology.展开更多
The rapid development of biological technology (BT) and information technology (IT) especially of genomics and artificial intelligence (AI) is bringing great potential for revolutionizing future medicine. We propose t...The rapid development of biological technology (BT) and information technology (IT) especially of genomics and artificial intelligence (AI) is bringing great potential for revolutionizing future medicine. We propose the concept and framework of Digital Life Systems or dLife as a new paradigm to unleash this potential. It includes the multi-scale and multi-granule measure and representation of life in the digital space, the mathematical and/or computational modeling of the biology behind physiological and pathological processes, and ultimately cyber twins of healthy or diseased human body in the virtual space that can be used to simulate complex biological processes and deduce effects of medical treatments. We advocate that dLife is the route toward future AI precision medicine and should be the new paradigm for future biological and medical research.展开更多
Earth science data have shown rapid growth since the 21st century with the improvement of experimental instruments and testing methods.This provides a basis for revealing the evolutionary history of life,climate,palae...Earth science data have shown rapid growth since the 21st century with the improvement of experimental instruments and testing methods.This provides a basis for revealing the evolutionary history of life,climate,palaeogeography and economic deposits by using big data.However,it is a major challenge to integrate Earth science data for the complexity of the Earth system,the great number of terminologies in Earth science,the diversity of research methods and proxies,and the diversification of data types.展开更多
Since the beginning of the 21 st century,the geoscience research has been entering a significant transitional period with the establishment of a new knowledge system as the core and with the drive of big data as the m...Since the beginning of the 21 st century,the geoscience research has been entering a significant transitional period with the establishment of a new knowledge system as the core and with the drive of big data as the means.It is a revolutionary leap in the research of geoscience knowledge discovery from the traditional encyclopedic discipline knowledge system to the computer-understandable and operable knowledge graph.Based on adopting the graph pattern of general knowledge representation,the geoscience knowledge graph expands the unique spatiotemporal features to the Geoscience knowledge,and integrates geoscience knowledge elements,such as map,text,and number,to establish an all-domain geoscience knowledge representation model.A federated,crowd intelligence-based collaborative method of constructing the geoscience knowledge graph is developed here,which realizes the construction of high-quality professional knowledge graph in collaboration with global geo-scientists.We also develop a method for constructing a dynamic knowledge graph of multi-modal geoscience data based on in-depth text analysis,which extracts geoscience knowledge from massive geoscience literature to construct the latest and most complete dynamic geoscience knowledge graph.A comprehensive and systematic geoscience knowledge graph can not only deepen the existing geoscience big data analysis,but also advance the construction of the high-precision geological time scale driven by big data,the compilation of intelligent maps driven by rules and data,and the geoscience knowledge evolution and reasoning analysis,among others.It will further expand the new directions of geoscience research driven by both data and knowledge,break new ground where geoscience,information science,and data science converge,realize the original innovation of the geoscience research and achieve major theoretical breakthroughs in the spatiotemporal big data research.展开更多
The establishment of a landscape of enhancers across human cells is crucial to deciphering the mechanism of gene regulation,cell differentiation,and disease development.High-throughput experimental approaches,which co...The establishment of a landscape of enhancers across human cells is crucial to deciphering the mechanism of gene regulation,cell differentiation,and disease development.High-throughput experimental approaches,which contain successfully reported enhancers in typical cell lines,are still too costly and time-consuming to perform systematic identification of enhancers specific to different cell lines.Existing computational methods,capable of predicting regulatory elements purely relying on DNA sequences,lack the power of cell line-specific screening.Recent studies have suggested that chromatin accessibility of a DNA segment is closely related to its potential function in regulation,and thus may provide useful information in identifying regulatory elements.Motivated by the aforementioned understanding,we integrate DNA sequences and chromatin accessibility data to accurately predict enhancers in a cell line-specific manner.We proposed Deep CAPE,a deep convolutional neural network to predict enhancers via the integration of DNA sequences and DNase-seq data.Benefitting from the well-designed feature extraction mechanism and skip connection strategy,our model not only consistently outperforms existing methods in the imbalanced classification of cell line-specific enhancers against background sequences,but also has the ability to self-adapt to different sizes of datasets.Besides,with the adoption of autoencoder,our model is capable of making cross-cell line predictions.We further visualize kernels of the first convolutional layer and show the match of identified sequence signatures and known motifs.We finally demonstrate the potential ability of our model to explain functional implications of putative disease-associated genetic variants and discriminate diseaserelated enhancers.The source code and detailed tutorial of Deep CAPE are freely available at https://github.com/Shengquan Chen/DeepCAPE.展开更多
基金funded by the National Natural Science Foundation of China(Grant Nos.42050101 and 42101467)the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDA23100101).
文摘Time is an essential reference system for recording objects,events,and processes in the field of geosciences.There are currently various time references,such as solar calendar,geological time,and regional calendar,to represent the knowledge in different domains and regions,which subsequently entails a time conversion process required to interpret temporal information under different time references.However,the current time conversion method is limited by the application scope of existing time ontologies(e.g.,“Jurassic”is a period in geological ontology,but a point value in calendar ontology)and the reliance on experience in conversion processes.These issues restrict accurate and efficient calculation of temporal information across different time references.To address these issues,this paper proposes a Unified Time Framework(UTF)in the geosciences knowledge system.According to a systematic time element parsing from massive time references,the proposed UTF designs an independent time root node to get rid of irrelevant nodes when accessing different time types and to adapt to the time expression of different geoscience disciplines.Furthermore,this UTF carries out several designs:to ensure the accuracy of time expressions by designing quantitative relationship definitions;to enable time calculations across different time elements by designing unified time nodes and structures,and to link to the required external ontologies by designing adequate interfaces.By comparing the time conversion methods,the experiment proves the UTF greatly supports accurate and efficient calculation of temporal information across different time references in SPARQL queries.Moreover,it shows a higher and more stable performance of temporal information queries than the time conversion method.With the advent of the Big Data era in the geosciences,the UTF can be used more widely to discover new geosciences knowledge across different time references.
基金supported by the National Natural Science Foundation of China(Grant No.42050101)the National Key Research and Development Program of China(Grant Nos.2022YFB3904200&2021YFB00903)supported by the International Big Science Program of Deeptime Digital Earth(DDE)。
文摘Geoscience knowledge graph(GKG)can organize various geoscience knowledge into a machine understandable and computable semantic network and is an effective way to organize geoscience knowledge and provide knowledge-related services.As a result,it has gained significant attention and become a frontier in geoscience.Geoscience knowledge is derived from many disciplines and has complex spatiotemporal features and relationships of multiple scales,granularities,and dimensions.Therefore,establishing a GKG representation model conforming to the characteristics of geoscience knowledge is the basis and premise for the construction and application of GKG.However,existing knowledge graph representation models leverage fixed tuples that are limited in fully representing complex spatiotemporal features and relationships.To address this issue,this paper first systematically analyzes the categorization and spatiotemporal features and relationships of geoscience knowledge.On this basis,an adaptive representation model for GKG is proposed by considering the complex spatiotemporal features and relationships.Under the constraint of a unified spatiotemporal ontology,this model adopts different tuples to adaptively represent different types of geoscience knowledge according to their spatiotemporal correlation.This model can efficiently represent geoscience knowledge,thereby avoiding the isolation of the spatiotemporal feature representation and improving the accuracy and efficiency of geoscience knowledge retrieval.It can further enable the alignment,transformation,computation,and reasoning of spatiotemporal information through a spatiotemporal ontology.
基金partially supported by the National Natural Science Foundation of China(NSFC)(Nos.61721003 and 62250005)the National Key R&D Program of China(No.2021YFF1200900)Tsinghua-Fuzhou Institute for Data Technology(No.TFIDT2021005).
文摘The rapid development of biological technology (BT) and information technology (IT) especially of genomics and artificial intelligence (AI) is bringing great potential for revolutionizing future medicine. We propose the concept and framework of Digital Life Systems or dLife as a new paradigm to unleash this potential. It includes the multi-scale and multi-granule measure and representation of life in the digital space, the mathematical and/or computational modeling of the biology behind physiological and pathological processes, and ultimately cyber twins of healthy or diseased human body in the virtual space that can be used to simulate complex biological processes and deduce effects of medical treatments. We advocate that dLife is the route toward future AI precision medicine and should be the new paradigm for future biological and medical research.
基金supported by the National Natural Science Foundation of China(Grant No.42050102).
文摘Earth science data have shown rapid growth since the 21st century with the improvement of experimental instruments and testing methods.This provides a basis for revealing the evolutionary history of life,climate,palaeogeography and economic deposits by using big data.However,it is a major challenge to integrate Earth science data for the complexity of the Earth system,the great number of terminologies in Earth science,the diversity of research methods and proxies,and the diversification of data types.
基金supported by the National Natural Science Foundation of China(Grant Nos.41421001,42050101,and 42050105)。
文摘Since the beginning of the 21 st century,the geoscience research has been entering a significant transitional period with the establishment of a new knowledge system as the core and with the drive of big data as the means.It is a revolutionary leap in the research of geoscience knowledge discovery from the traditional encyclopedic discipline knowledge system to the computer-understandable and operable knowledge graph.Based on adopting the graph pattern of general knowledge representation,the geoscience knowledge graph expands the unique spatiotemporal features to the Geoscience knowledge,and integrates geoscience knowledge elements,such as map,text,and number,to establish an all-domain geoscience knowledge representation model.A federated,crowd intelligence-based collaborative method of constructing the geoscience knowledge graph is developed here,which realizes the construction of high-quality professional knowledge graph in collaboration with global geo-scientists.We also develop a method for constructing a dynamic knowledge graph of multi-modal geoscience data based on in-depth text analysis,which extracts geoscience knowledge from massive geoscience literature to construct the latest and most complete dynamic geoscience knowledge graph.A comprehensive and systematic geoscience knowledge graph can not only deepen the existing geoscience big data analysis,but also advance the construction of the high-precision geological time scale driven by big data,the compilation of intelligent maps driven by rules and data,and the geoscience knowledge evolution and reasoning analysis,among others.It will further expand the new directions of geoscience research driven by both data and knowledge,break new ground where geoscience,information science,and data science converge,realize the original innovation of the geoscience research and achieve major theoretical breakthroughs in the spatiotemporal big data research.
基金partially supported by the National Key R&D Program of China(Grant No.2018YFC0910404)the National Natural Science Foundation of China(Grant Nos.61873141,61721003,61573207,71871019,71471016,71531013,and 71729001)the Tsinghua-Fuzhou Institute for Data Technology,China。
文摘The establishment of a landscape of enhancers across human cells is crucial to deciphering the mechanism of gene regulation,cell differentiation,and disease development.High-throughput experimental approaches,which contain successfully reported enhancers in typical cell lines,are still too costly and time-consuming to perform systematic identification of enhancers specific to different cell lines.Existing computational methods,capable of predicting regulatory elements purely relying on DNA sequences,lack the power of cell line-specific screening.Recent studies have suggested that chromatin accessibility of a DNA segment is closely related to its potential function in regulation,and thus may provide useful information in identifying regulatory elements.Motivated by the aforementioned understanding,we integrate DNA sequences and chromatin accessibility data to accurately predict enhancers in a cell line-specific manner.We proposed Deep CAPE,a deep convolutional neural network to predict enhancers via the integration of DNA sequences and DNase-seq data.Benefitting from the well-designed feature extraction mechanism and skip connection strategy,our model not only consistently outperforms existing methods in the imbalanced classification of cell line-specific enhancers against background sequences,but also has the ability to self-adapt to different sizes of datasets.Besides,with the adoption of autoencoder,our model is capable of making cross-cell line predictions.We further visualize kernels of the first convolutional layer and show the match of identified sequence signatures and known motifs.We finally demonstrate the potential ability of our model to explain functional implications of putative disease-associated genetic variants and discriminate diseaserelated enhancers.The source code and detailed tutorial of Deep CAPE are freely available at https://github.com/Shengquan Chen/DeepCAPE.