As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate unders...As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.展开更多
This study investigates the relation between the Coulomb failure stress of the mainshock with the aftershocks sequence following the 2018 Palu-Donggala earthquake in Indonesia.We calculate the Coulomb failure stress u...This study investigates the relation between the Coulomb failure stress of the mainshock with the aftershocks sequence following the 2018 Palu-Donggala earthquake in Indonesia.We calculate the Coulomb failure stress using the available coseismic fault models,which had varied moment magnitudes between M_W7.53~M_W7.62.Different interpretations of the fault sources were suggested by previous studies.While two fault models suggested that one inland fault segment ruptured during the earthquake,another fault model proposed that two fault segments ruptured inland of Central Sulawesi and along the coast of Palu bay.We further overlay the positive and negative values of Coulomb failure stress with the reported relocated aftershock.We find that only by conducting Coulomb failure stress analysis,we can not favour the preference of the coseismic fault which explains aftershock distribution.This investigation demonstrates that additional observational data from geological field surveys are required to identify the surface rupture in comparison with the coseismic fault model.展开更多
Many detailed data on past geological hazard events are buried in geological hazard reports and have not been fully utilized. The growing developments in geographic information retrieval and temporal information retri...Many detailed data on past geological hazard events are buried in geological hazard reports and have not been fully utilized. The growing developments in geographic information retrieval and temporal information retrieval offer opportunities to analyse this wealth of data to mine the spatiotemporal evolution of geological disaster occurrence and enhance risk decision making. This study presents a combined NLP and ontology matching information extraction framework for automatically recognizing semantic and spatiotemporal information from geological hazard reports. This framework mainly extracts unstructured information from geological disaster reports through named entity recognition, ontology matching and gazetteer matching to identify and annotate elements, thus enabling users to quickly obtain key information and understand the general content of disaster reports. In addition, we present the final results obtained from the experiments through a reasonable visualization and analyse the visual results. The extraction and retrieval of semantic information related to the dynamics of geohazard events are performed from both natural and human perspectives to provide information on the progress of events.展开更多
Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integrati...Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fundamental research task for building knowledge graphs, which needs to be supported by a high-quality corpus, and currently there is a lack of high-quality named entity recognition corpus in the field of geology, especially in Chinese. In this paper, based on the conceptual structure of geological ontology and the analysis of the characteristics of geological texts, a classification system of geological named entity types is designed with the guidance and participation of geological experts, a corresponding annotation specification is formulated, an annotation tool is developed, and the first named entity recognition corpus for the geological domain is annotated based on real geological reports. The total number of words annotated was 698 512 and the number of entities was 23 345. The paper also explores the feasibility of a model pre-annotation strategy and presents a statistical analysis of the distribution of technical and term categories across genres and the consistency of corpus annotation. Based on this corpus, a Lite Bidirectional Encoder Representations from Transformers(ALBERT)-Bi-directional Long Short-Term Memory(BiLSTM)-Conditional Random Fields(CRF) and ALBERT-BiLSTM models are selected for experiments, and the results show that the F1-scores of the recognition performance of the two models reach 0.75 and 0.65 respectively, providing a corpus basis and technical support for information extraction in the field of geology.展开更多
基金financially supported by the Natural Science Foundation of China(Grant No.42301492)the National Key R&D Program of China(Grant Nos.2022YFF0711600,2022YFF0801201,2022YFF0801200)+3 种基金the Major Special Project of Xinjiang(Grant No.2022A03009-3)the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources(Grant No.KF-2022-07014)the Opening Fund of the Key Laboratory of the Geological Survey and Evaluation of the Ministry of Education(Grant No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities。
文摘As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information.
基金supported by the 2019 World Class University Research Fund of Bandung Institute of Technology for International Research No.LPPM.PN-10-30-2019the 2018 Overseas Research Grants of the Asahi Glass Foundation No.FTTM.PN-5-01-2019
文摘This study investigates the relation between the Coulomb failure stress of the mainshock with the aftershocks sequence following the 2018 Palu-Donggala earthquake in Indonesia.We calculate the Coulomb failure stress using the available coseismic fault models,which had varied moment magnitudes between M_W7.53~M_W7.62.Different interpretations of the fault sources were suggested by previous studies.While two fault models suggested that one inland fault segment ruptured during the earthquake,another fault model proposed that two fault segments ruptured inland of Central Sulawesi and along the coast of Palu bay.We further overlay the positive and negative values of Coulomb failure stress with the reported relocated aftershock.We find that only by conducting Coulomb failure stress analysis,we can not favour the preference of the coseismic fault which explains aftershock distribution.This investigation demonstrates that additional observational data from geological field surveys are required to identify the surface rupture in comparison with the coseismic fault model.
基金the IUGS Deep-time Digital Earth (DDE) Big Science Programfinancially supported by the National Key R & D Program of China (No.2022YFB3904200)+4 种基金the Natural Science Foundation of Hubei Province of China (No.2022CFB640)the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources (No.KF-202207-014)the Opening Fund of Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No.2022SDSJ04)the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education (No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities。
文摘Many detailed data on past geological hazard events are buried in geological hazard reports and have not been fully utilized. The growing developments in geographic information retrieval and temporal information retrieval offer opportunities to analyse this wealth of data to mine the spatiotemporal evolution of geological disaster occurrence and enhance risk decision making. This study presents a combined NLP and ontology matching information extraction framework for automatically recognizing semantic and spatiotemporal information from geological hazard reports. This framework mainly extracts unstructured information from geological disaster reports through named entity recognition, ontology matching and gazetteer matching to identify and annotate elements, thus enabling users to quickly obtain key information and understand the general content of disaster reports. In addition, we present the final results obtained from the experiments through a reasonable visualization and analyse the visual results. The extraction and retrieval of semantic information related to the dynamics of geohazard events are performed from both natural and human perspectives to provide information on the progress of events.
基金the IUGS Deep-time Digital Earth (DDE) Big Science Programfinancially supported by the National Key R&D Program of China (No.2022YFF0711601)+4 种基金the Natural Science Foundation of Hubei Province of China (No.2022CFB640)the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education (No.GLAB 2023ZR01)the Fundamental Research Funds for the Central Universities,State Key Laboratory of Geo-Information Engineering and Key Laboratory of Surveying and Mapping Science and Geospatial Information Technology of MNR,Chinese Academy of Surveying and Mapping (No.2022-03-08)the Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities,MNR (NO.KFKT-2022-02)the Project of Chengdu Municipal Bureau of Planning and Natural Resources (No.5101012018002703)。
文摘Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fundamental research task for building knowledge graphs, which needs to be supported by a high-quality corpus, and currently there is a lack of high-quality named entity recognition corpus in the field of geology, especially in Chinese. In this paper, based on the conceptual structure of geological ontology and the analysis of the characteristics of geological texts, a classification system of geological named entity types is designed with the guidance and participation of geological experts, a corresponding annotation specification is formulated, an annotation tool is developed, and the first named entity recognition corpus for the geological domain is annotated based on real geological reports. The total number of words annotated was 698 512 and the number of entities was 23 345. The paper also explores the feasibility of a model pre-annotation strategy and presents a statistical analysis of the distribution of technical and term categories across genres and the consistency of corpus annotation. Based on this corpus, a Lite Bidirectional Encoder Representations from Transformers(ALBERT)-Bi-directional Long Short-Term Memory(BiLSTM)-Conditional Random Fields(CRF) and ALBERT-BiLSTM models are selected for experiments, and the results show that the F1-scores of the recognition performance of the two models reach 0.75 and 0.65 respectively, providing a corpus basis and technical support for information extraction in the field of geology.