Aiming at the lack of classification and good standard corpus in the task of joint entity and relationship extraction in the current Chinese academic field, this paper builds a dataset in management science that can b...Aiming at the lack of classification and good standard corpus in the task of joint entity and relationship extraction in the current Chinese academic field, this paper builds a dataset in management science that can be used for joint entity and relationship extraction, and establishes a deep learning model to extract entity and relationship information from scientific texts. With the definition of entity and relation classification, we build a Chinese scientific text corpus dataset based on the abstract texts of projects funded by the National Natural Science Foundation of China(NSFC) in 2018–2019. By combining the word2vec features with the clue word feature which is a kind of special style in scientific documents, we establish a joint entity relationship extraction model based on the Bi LSTM-CNN-CRF model for scientific information extraction. The dataset we constructed contains 13060 entities(not duplicated) and 9728 entity relation labels. In terms of entity prediction effect, the accuracy rate of the constructed model reaches 69.15%, the recall rate reaches 61.03%, and the F1 value reaches 64.83%. In terms of relationship prediction effect, the accuracy rate is higher than that of entity prediction, which reflects the effectiveness of the input mixed features and the integration of local features with CNN layer in the model.展开更多
Entity relation extraction(ERE)is an important task in the field of information extraction.With the wide application of pre-training language model(PLM)in natural language processing(NLP),using PLM has become a brand ...Entity relation extraction(ERE)is an important task in the field of information extraction.With the wide application of pre-training language model(PLM)in natural language processing(NLP),using PLM has become a brand new research direction of ERE.In this paper,BERT is used to extracting entityrelations,and a separated pipeline architecture is proposed.ERE was decomposed into entity-relation classification sub-task and entity-pair annotation sub-task.Both sub-tasks conduct the pre-training and fine-tuning independently.Combining dynamic and static masking,newVerb-MLM and Entity-MLM BERT pre-training tasks were put forward to enhance the correlation between BERT pre-training and TargetedNLPdownstream task-ERE.Inter-layer sharing attentionmechanismwas added to the model,sharing the attention parameters according to the similarity of the attention matrix.Contrast experiment on the SemEavl 2010 Task8 dataset demonstrates that the new MLM task and inter-layer sharing attention mechanism improve the performance of BERT on the entity relation extraction effectively.展开更多
基金Supported by the National Natural Science Foundation of China (71804017)the R&D Program of Beijing Municipal Education Commission (KZ202210005013)the Sichuan Social Science Planning Project (SC22B151)。
文摘Aiming at the lack of classification and good standard corpus in the task of joint entity and relationship extraction in the current Chinese academic field, this paper builds a dataset in management science that can be used for joint entity and relationship extraction, and establishes a deep learning model to extract entity and relationship information from scientific texts. With the definition of entity and relation classification, we build a Chinese scientific text corpus dataset based on the abstract texts of projects funded by the National Natural Science Foundation of China(NSFC) in 2018–2019. By combining the word2vec features with the clue word feature which is a kind of special style in scientific documents, we establish a joint entity relationship extraction model based on the Bi LSTM-CNN-CRF model for scientific information extraction. The dataset we constructed contains 13060 entities(not duplicated) and 9728 entity relation labels. In terms of entity prediction effect, the accuracy rate of the constructed model reaches 69.15%, the recall rate reaches 61.03%, and the F1 value reaches 64.83%. In terms of relationship prediction effect, the accuracy rate is higher than that of entity prediction, which reflects the effectiveness of the input mixed features and the integration of local features with CNN layer in the model.
基金Hainan Province High level talent project of basic and applied basic research plan(Natural Science Field)in 2019(No.2019RC100)Haikou City Key Science and Technology Plan Project(2020–049)Hainan Province Key Research and Development Project(ZDYF2020018).
文摘Entity relation extraction(ERE)is an important task in the field of information extraction.With the wide application of pre-training language model(PLM)in natural language processing(NLP),using PLM has become a brand new research direction of ERE.In this paper,BERT is used to extracting entityrelations,and a separated pipeline architecture is proposed.ERE was decomposed into entity-relation classification sub-task and entity-pair annotation sub-task.Both sub-tasks conduct the pre-training and fine-tuning independently.Combining dynamic and static masking,newVerb-MLM and Entity-MLM BERT pre-training tasks were put forward to enhance the correlation between BERT pre-training and TargetedNLPdownstream task-ERE.Inter-layer sharing attentionmechanismwas added to the model,sharing the attention parameters according to the similarity of the attention matrix.Contrast experiment on the SemEavl 2010 Task8 dataset demonstrates that the new MLM task and inter-layer sharing attention mechanism improve the performance of BERT on the entity relation extraction effectively.