If progress is to be made toward improving geohazard management and emergency decision-making,then lessons need to be learned from past geohazard information.A geologic hazard report provides a useful and reliable sou...If progress is to be made toward improving geohazard management and emergency decision-making,then lessons need to be learned from past geohazard information.A geologic hazard report provides a useful and reliable source of information about the occurrence of an event,along with detailed information about the condition or factors of the geohazard.Analyzing such reports,however,can be a challenging process because these texts are often presented in unstructured long text formats,and contain rich specialized and detailed information.Automatically text classification is commonly used to mine disaster text data in open domains(e.g.,news and microblogs).But it has limitations to performing contextual long-distance dependencies and is insensitive to discourse order.These deficiencies are most obviously exposed in long text fields.Therefore,this paper uses the bidirectional encoder representations from Transformers(BERT),to model long text.Then,utilizing a softmax layer to automatically extract text features and classify geohazards without manual features.The latent Dirichlet allocation(LDA)model is used to examine the interdependencies that exist between causal variables to visualize geohazards.The proposed method is useful in enabling the machine-assisted interpretation of text-based geohazards.Moreover,it can help users visualize causes,processes,and other geohazards and assist decision-makers in emergency responses.展开更多
Open data initiatives have promoted governmental agencies and scientific organizations to publish data online for reuse.Research of geoscience focuses on processing georeferenced quantitative data(e.g.,rock parameters...Open data initiatives have promoted governmental agencies and scientific organizations to publish data online for reuse.Research of geoscience focuses on processing georeferenced quantitative data(e.g.,rock parameters,geochemical tests,geophysical surveys and satellite imagery)for discovering new knowledge.Geological knowledge is the cognitive result of human knowledge of the spatial distribution,evolution and interaction patterns of geological objects or processes.Knowledge graphs(KGs)can formalize unstructured knowledge into structured form and have been used in supporting decision-making recently.In this paper,we propose a novel framework that can extract the geological knowledge graph(GKG)from public reports relating to a modelling study.Based on the analysis of basic questions answered by geology,we summarize and abstract geological knowledge elements and then explore a geological knowledge representation model with three levels of“geological conceptsgeological entities-geological relations”to describe semantic units of geological knowledge and their logic relations.Finally,based on the characteristics of mineral resource reports,the geological knowledge representation model oriented to“object relationships”and the hierarchical geological knowledge representation model oriented to“process relationships”are proposed with reference to the commonly used geological knowledge graph representation.The research in this paper can provide some implications for the formalization and structured representation of geological knowledge graphs.展开更多
The occurrence of geological disasters can have a large impact on urban safety. Protecting people’s safety is the most important concern when disasters occur. Safety improvement requires a large amount of comprehensi...The occurrence of geological disasters can have a large impact on urban safety. Protecting people’s safety is the most important concern when disasters occur. Safety improvement requires a large amount of comprehensive and representative risk analysis and a large collection of information related to geological hazards, including unstructured knowledge and experience. To address the relevant information and support safety risk analysis, a geological hazard knowledge graph is developed automatically based on computer vision and domain-geoscience ontology to identify geological hazards from input images while obeying safety rules and regulations, even when affected by changes. In the implementation of the knowledge graph, we design an ontology schema of geological disasters based on a top-down approach, and by organizing knowledge as a logical semantic expression, it can be shared using ontology technologies and therefore enable semantic interoperability. Computer vision approaches are then used to automatically detect a set of entities and attributes, using the data from input images, and object types and their attributes are identified so that they can be stored in Neo4j for reasoning and searching. Finally, a reasoning model for geological hazard identification was developed using the Neo4j database to create nodes, relationships, and their properties for modeling, and geological hazards in the images can be automatically identified by searching the Neo4j database. An application on geological hazard is presented. The results show the effectiveness of the proposed approach in terms of identifying possible potential hazards in geological hazards and assisting in formulating targeted preventive measures.展开更多
Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integrati...Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fundamental research task for building knowledge graphs, which needs to be supported by a high-quality corpus, and currently there is a lack of high-quality named entity recognition corpus in the field of geology, especially in Chinese. In this paper, based on the conceptual structure of geological ontology and the analysis of the characteristics of geological texts, a classification system of geological named entity types is designed with the guidance and participation of geological experts, a corresponding annotation specification is formulated, an annotation tool is developed, and the first named entity recognition corpus for the geological domain is annotated based on real geological reports. The total number of words annotated was 698 512 and the number of entities was 23 345. The paper also explores the feasibility of a model pre-annotation strategy and presents a statistical analysis of the distribution of technical and term categories across genres and the consistency of corpus annotation. Based on this corpus, a Lite Bidirectional Encoder Representations from Transformers(ALBERT)-Bi-directional Long Short-Term Memory(BiLSTM)-Conditional Random Fields(CRF) and ALBERT-BiLSTM models are selected for experiments, and the results show that the F1-scores of the recognition performance of the two models reach 0.75 and 0.65 respectively, providing a corpus basis and technical support for information extraction in the field of geology.展开更多
Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data descr...Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data description text are the key links for constructing knowledge graphs. Given the lack of publicly annotated datasets in the geology domain, this paper illustrates the construction process of geological entity datasets, defines the types of entities and interconceptual relationships by using the geological entity concept system, and completes the construction of the geological corpus. To address the shortcomings of existing language models(such as Word2vec and Glove) that cannot solve polysemous words and have a poor ability to fuse contexts, we propose a geological named entity recognition and relationship extraction model jointly with Bidirectional Encoder Representation from Transformers(BERT) pretrained language model. To effectively represent the text features, we construct a BERT-bidirectional gated recurrent unit network(BiGRU)-conditional random field(CRF)-based architecture to extract the named entities and the BERT-BiGRU-Attention-based architecture to extract the entity relations. The results show that the F1-score of the BERT-BiGRU-CRF named entity recognition model is 0.91 and the F1-score of the BERT-BiGRU-Attention relationship extraction model is 0.84, which are significant performance improvements when compared to classic language models(e.g., word2vec and Embedding from Language Models(ELMo)).展开更多
Many detailed data on past geological hazard events are buried in geological hazard reports and have not been fully utilized. The growing developments in geographic information retrieval and temporal information retri...Many detailed data on past geological hazard events are buried in geological hazard reports and have not been fully utilized. The growing developments in geographic information retrieval and temporal information retrieval offer opportunities to analyse this wealth of data to mine the spatiotemporal evolution of geological disaster occurrence and enhance risk decision making. This study presents a combined NLP and ontology matching information extraction framework for automatically recognizing semantic and spatiotemporal information from geological hazard reports. This framework mainly extracts unstructured information from geological disaster reports through named entity recognition, ontology matching and gazetteer matching to identify and annotate elements, thus enabling users to quickly obtain key information and understand the general content of disaster reports. In addition, we present the final results obtained from the experiments through a reasonable visualization and analyse the visual results. The extraction and retrieval of semantic information related to the dynamics of geohazard events are performed from both natural and human perspectives to provide information on the progress of events.展开更多
Geoscience knowledge graph(GKG)can organize various geoscience knowledge into a machine understandable and computable semantic network and is an effective way to organize geoscience knowledge and provide knowledge-rel...Geoscience knowledge graph(GKG)can organize various geoscience knowledge into a machine understandable and computable semantic network and is an effective way to organize geoscience knowledge and provide knowledge-related services.As a result,it has gained significant attention and become a frontier in geoscience.Geoscience knowledge is derived from many disciplines and has complex spatiotemporal features and relationships of multiple scales,granularities,and dimensions.Therefore,establishing a GKG representation model conforming to the characteristics of geoscience knowledge is the basis and premise for the construction and application of GKG.However,existing knowledge graph representation models leverage fixed tuples that are limited in fully representing complex spatiotemporal features and relationships.To address this issue,this paper first systematically analyzes the categorization and spatiotemporal features and relationships of geoscience knowledge.On this basis,an adaptive representation model for GKG is proposed by considering the complex spatiotemporal features and relationships.Under the constraint of a unified spatiotemporal ontology,this model adopts different tuples to adaptively represent different types of geoscience knowledge according to their spatiotemporal correlation.This model can efficiently represent geoscience knowledge,thereby avoiding the isolation of the spatiotemporal feature representation and improving the accuracy and efficiency of geoscience knowledge retrieval.It can further enable the alignment,transformation,computation,and reasoning of spatiotemporal information through a spatiotemporal ontology.展开更多
基金supported by the Natural Science Foundation of China(No.42301492)the National Key Research and Development Program(No.2022YFB3904200)+4 种基金the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources(No.KF-2022-07-014)the Natural Science Foundation of Hubei Province of China(No.2022CFB640)the Open Fund of Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering(No.2022SDSJ04)the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education(No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities.
文摘If progress is to be made toward improving geohazard management and emergency decision-making,then lessons need to be learned from past geohazard information.A geologic hazard report provides a useful and reliable source of information about the occurrence of an event,along with detailed information about the condition or factors of the geohazard.Analyzing such reports,however,can be a challenging process because these texts are often presented in unstructured long text formats,and contain rich specialized and detailed information.Automatically text classification is commonly used to mine disaster text data in open domains(e.g.,news and microblogs).But it has limitations to performing contextual long-distance dependencies and is insensitive to discourse order.These deficiencies are most obviously exposed in long text fields.Therefore,this paper uses the bidirectional encoder representations from Transformers(BERT),to model long text.Then,utilizing a softmax layer to automatically extract text features and classify geohazards without manual features.The latent Dirichlet allocation(LDA)model is used to examine the interdependencies that exist between causal variables to visualize geohazards.The proposed method is useful in enabling the machine-assisted interpretation of text-based geohazards.Moreover,it can help users visualize causes,processes,and other geohazards and assist decision-makers in emergency responses.
基金the IUGS Deep-time Digital Earth(DDE)Big Science Programfinancially supported by the National Key R&D Program of China(No.2022YFF0711601)+4 种基金the Natural Science Foundation of Hubei Province of China(No.2022CFB640)the Opening Fund of Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering(No.2022SDSJ04)the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education(No.GLAB 2023ZR01)the Fundamental Research Funds for the Central UniversitiesFunded by Joint Fund of Collaborative Innovation Center of Geo-Information Technology for Smart Central Plains,Henan Province and Key Laboratory of Spatiotemporal Perception and Intelligent processing,Ministry of Natural Resources(No.212205)。
文摘Open data initiatives have promoted governmental agencies and scientific organizations to publish data online for reuse.Research of geoscience focuses on processing georeferenced quantitative data(e.g.,rock parameters,geochemical tests,geophysical surveys and satellite imagery)for discovering new knowledge.Geological knowledge is the cognitive result of human knowledge of the spatial distribution,evolution and interaction patterns of geological objects or processes.Knowledge graphs(KGs)can formalize unstructured knowledge into structured form and have been used in supporting decision-making recently.In this paper,we propose a novel framework that can extract the geological knowledge graph(GKG)from public reports relating to a modelling study.Based on the analysis of basic questions answered by geology,we summarize and abstract geological knowledge elements and then explore a geological knowledge representation model with three levels of“geological conceptsgeological entities-geological relations”to describe semantic units of geological knowledge and their logic relations.Finally,based on the characteristics of mineral resource reports,the geological knowledge representation model oriented to“object relationships”and the hierarchical geological knowledge representation model oriented to“process relationships”are proposed with reference to the commonly used geological knowledge graph representation.The research in this paper can provide some implications for the formalization and structured representation of geological knowledge graphs.
基金the IUGS Deep-time Digital Earth (DDE) Big Science Programfinancially supported by the National Key R & D Program of China (No.2022YFF0711601)+3 种基金the Natural Science Foundation of Hubei Province of China (No.2022CFB640)the Opening Fund of Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No.2022SDSJ04)the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education (No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities。
文摘The occurrence of geological disasters can have a large impact on urban safety. Protecting people’s safety is the most important concern when disasters occur. Safety improvement requires a large amount of comprehensive and representative risk analysis and a large collection of information related to geological hazards, including unstructured knowledge and experience. To address the relevant information and support safety risk analysis, a geological hazard knowledge graph is developed automatically based on computer vision and domain-geoscience ontology to identify geological hazards from input images while obeying safety rules and regulations, even when affected by changes. In the implementation of the knowledge graph, we design an ontology schema of geological disasters based on a top-down approach, and by organizing knowledge as a logical semantic expression, it can be shared using ontology technologies and therefore enable semantic interoperability. Computer vision approaches are then used to automatically detect a set of entities and attributes, using the data from input images, and object types and their attributes are identified so that they can be stored in Neo4j for reasoning and searching. Finally, a reasoning model for geological hazard identification was developed using the Neo4j database to create nodes, relationships, and their properties for modeling, and geological hazards in the images can be automatically identified by searching the Neo4j database. An application on geological hazard is presented. The results show the effectiveness of the proposed approach in terms of identifying possible potential hazards in geological hazards and assisting in formulating targeted preventive measures.
基金the IUGS Deep-time Digital Earth (DDE) Big Science Programfinancially supported by the National Key R&D Program of China (No.2022YFF0711601)+4 种基金the Natural Science Foundation of Hubei Province of China (No.2022CFB640)the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education (No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities,State Key Laboratory of Geo-Information Engineering and Key Laboratory of Surveying and Mapping Science and Geospatial Information Technology of MNR,Chinese Academy of Surveying and Mapping (No.2022-03-08)the Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities,MNR (NO.KFKT-2022-02)the Project of Chengdu Municipal Bureau of Planning and Natural Resources (No.5101012018002703)。
文摘Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fundamental research task for building knowledge graphs, which needs to be supported by a high-quality corpus, and currently there is a lack of high-quality named entity recognition corpus in the field of geology, especially in Chinese. In this paper, based on the conceptual structure of geological ontology and the analysis of the characteristics of geological texts, a classification system of geological named entity types is designed with the guidance and participation of geological experts, a corresponding annotation specification is formulated, an annotation tool is developed, and the first named entity recognition corpus for the geological domain is annotated based on real geological reports. The total number of words annotated was 698 512 and the number of entities was 23 345. The paper also explores the feasibility of a model pre-annotation strategy and presents a statistical analysis of the distribution of technical and term categories across genres and the consistency of corpus annotation. Based on this corpus, a Lite Bidirectional Encoder Representations from Transformers(ALBERT)-Bi-directional Long Short-Term Memory(BiLSTM)-Conditional Random Fields(CRF) and ALBERT-BiLSTM models are selected for experiments, and the results show that the F1-scores of the recognition performance of the two models reach 0.75 and 0.65 respectively, providing a corpus basis and technical support for information extraction in the field of geology.
基金financially supported by the National Key R&D Program of China (No.2022YFF0711601)the Natural Science Foundation of Hubei Province of China (No.2022CFB640)+2 种基金the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources (No.KF-2022-07-014)the Opening Fund of Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No.2022SDSJ04)the Beijing Key Laboratory of Urban Spatial Information Engineering (No.20220108)。
文摘Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data description text are the key links for constructing knowledge graphs. Given the lack of publicly annotated datasets in the geology domain, this paper illustrates the construction process of geological entity datasets, defines the types of entities and interconceptual relationships by using the geological entity concept system, and completes the construction of the geological corpus. To address the shortcomings of existing language models(such as Word2vec and Glove) that cannot solve polysemous words and have a poor ability to fuse contexts, we propose a geological named entity recognition and relationship extraction model jointly with Bidirectional Encoder Representation from Transformers(BERT) pretrained language model. To effectively represent the text features, we construct a BERT-bidirectional gated recurrent unit network(BiGRU)-conditional random field(CRF)-based architecture to extract the named entities and the BERT-BiGRU-Attention-based architecture to extract the entity relations. The results show that the F1-score of the BERT-BiGRU-CRF named entity recognition model is 0.91 and the F1-score of the BERT-BiGRU-Attention relationship extraction model is 0.84, which are significant performance improvements when compared to classic language models(e.g., word2vec and Embedding from Language Models(ELMo)).
基金the IUGS Deep-time Digital Earth (DDE) Big Science Programfinancially supported by the National Key R & D Program of China (No.2022YFB3904200)+4 种基金the Natural Science Foundation of Hubei Province of China (No.2022CFB640)the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources (No.KF-202207-014)the Opening Fund of Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No.2022SDSJ04)the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education (No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities。
文摘Many detailed data on past geological hazard events are buried in geological hazard reports and have not been fully utilized. The growing developments in geographic information retrieval and temporal information retrieval offer opportunities to analyse this wealth of data to mine the spatiotemporal evolution of geological disaster occurrence and enhance risk decision making. This study presents a combined NLP and ontology matching information extraction framework for automatically recognizing semantic and spatiotemporal information from geological hazard reports. This framework mainly extracts unstructured information from geological disaster reports through named entity recognition, ontology matching and gazetteer matching to identify and annotate elements, thus enabling users to quickly obtain key information and understand the general content of disaster reports. In addition, we present the final results obtained from the experiments through a reasonable visualization and analyse the visual results. The extraction and retrieval of semantic information related to the dynamics of geohazard events are performed from both natural and human perspectives to provide information on the progress of events.
基金supported by the National Natural Science Foundation of China(Grant No.42050101)the National Key Research and Development Program of China(Grant Nos.2022YFB3904200&2021YFB00903)supported by the International Big Science Program of Deeptime Digital Earth(DDE)。
文摘Geoscience knowledge graph(GKG)can organize various geoscience knowledge into a machine understandable and computable semantic network and is an effective way to organize geoscience knowledge and provide knowledge-related services.As a result,it has gained significant attention and become a frontier in geoscience.Geoscience knowledge is derived from many disciplines and has complex spatiotemporal features and relationships of multiple scales,granularities,and dimensions.Therefore,establishing a GKG representation model conforming to the characteristics of geoscience knowledge is the basis and premise for the construction and application of GKG.However,existing knowledge graph representation models leverage fixed tuples that are limited in fully representing complex spatiotemporal features and relationships.To address this issue,this paper first systematically analyzes the categorization and spatiotemporal features and relationships of geoscience knowledge.On this basis,an adaptive representation model for GKG is proposed by considering the complex spatiotemporal features and relationships.Under the constraint of a unified spatiotemporal ontology,this model adopts different tuples to adaptively represent different types of geoscience knowledge according to their spatiotemporal correlation.This model can efficiently represent geoscience knowledge,thereby avoiding the isolation of the spatiotemporal feature representation and improving the accuracy and efficiency of geoscience knowledge retrieval.It can further enable the alignment,transformation,computation,and reasoning of spatiotemporal information through a spatiotemporal ontology.