Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data descr...Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data description text are the key links for constructing knowledge graphs. Given the lack of publicly annotated datasets in the geology domain, this paper illustrates the construction process of geological entity datasets, defines the types of entities and interconceptual relationships by using the geological entity concept system, and completes the construction of the geological corpus. To address the shortcomings of existing language models(such as Word2vec and Glove) that cannot solve polysemous words and have a poor ability to fuse contexts, we propose a geological named entity recognition and relationship extraction model jointly with Bidirectional Encoder Representation from Transformers(BERT) pretrained language model. To effectively represent the text features, we construct a BERT-bidirectional gated recurrent unit network(BiGRU)-conditional random field(CRF)-based architecture to extract the named entities and the BERT-BiGRU-Attention-based architecture to extract the entity relations. The results show that the F1-score of the BERT-BiGRU-CRF named entity recognition model is 0.91 and the F1-score of the BERT-BiGRU-Attention relationship extraction model is 0.84, which are significant performance improvements when compared to classic language models(e.g., word2vec and Embedding from Language Models(ELMo)).展开更多
基金financially supported by the National Key R&D Program of China (No.2022YFF0711601)the Natural Science Foundation of Hubei Province of China (No.2022CFB640)+2 种基金the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources (No.KF-2022-07-014)the Opening Fund of Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No.2022SDSJ04)the Beijing Key Laboratory of Urban Spatial Information Engineering (No.20220108)。
文摘Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data description text are the key links for constructing knowledge graphs. Given the lack of publicly annotated datasets in the geology domain, this paper illustrates the construction process of geological entity datasets, defines the types of entities and interconceptual relationships by using the geological entity concept system, and completes the construction of the geological corpus. To address the shortcomings of existing language models(such as Word2vec and Glove) that cannot solve polysemous words and have a poor ability to fuse contexts, we propose a geological named entity recognition and relationship extraction model jointly with Bidirectional Encoder Representation from Transformers(BERT) pretrained language model. To effectively represent the text features, we construct a BERT-bidirectional gated recurrent unit network(BiGRU)-conditional random field(CRF)-based architecture to extract the named entities and the BERT-BiGRU-Attention-based architecture to extract the entity relations. The results show that the F1-score of the BERT-BiGRU-CRF named entity recognition model is 0.91 and the F1-score of the BERT-BiGRU-Attention relationship extraction model is 0.84, which are significant performance improvements when compared to classic language models(e.g., word2vec and Embedding from Language Models(ELMo)).