期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
RCMR 280k:Refined Corpus for Move Recognition Based on PubMed Abstracts
1
作者 Jie Li Gaihong Yu Zhixiong Zhang 《Data Intelligence》 EI 2023年第3期511-536,共26页
Existing datasets for move recognition,such as PubMed 20ok RCT,exhibit several problems that significantly impact recognition performance,especially for Background and Objective labels.In order to improve the move rec... Existing datasets for move recognition,such as PubMed 20ok RCT,exhibit several problems that significantly impact recognition performance,especially for Background and Objective labels.In order to improve the move recognition performance,we introduce a method and construct a refined corpus based on PubMed,named RCMR 280k.This corpus comprises approximately 280,000 structured abstracts,totaling 3,386,008 sentences,each sentence is labeled with one of five categories:Background,Objective,Method,Result,or Conclusion.We also construct a subset of RCMR,named RCMR_RCT,corresponding to medical subdomain of RCTs.We conduct comparison experiments using our RCMR,RCMR_RCT with PubMed 380k and PubMed 200k RCT,respectively.The best results,obtained using the MSMBERT model,show that:(1)our RCMR outperforms PubMed 380k by 0.82%,while our RCMR_RCT outperforms PubMed 200k RCT by 9.35%;(2)compared with PubMed 380k,our corpus achieve better improvement on the Results and Conclusions categories,with average F1 performance improves 1%and 0.82%,respectively;(3)compared with PubMed 200k RCT,our corpus significantly improves the performance in the Background and Objective categories,with average F1 scores improves 28.31%and 37.22%,respectively.To the best of our knowledge,our RCMR is among the rarely high-quality,resource-rich refined PubMed corpora available.Our work in this paper has been applied in the SciAlEngine,which is openly accessible for researchers to conduct move recognition task. 展开更多
关键词 Refined corpus Move recognition Sequential sentence classification corpus construction corpus analysis
原文传递
Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach 被引量:1
2
作者 Qinjun Qiu Miao Tian +5 位作者 Zhong Xie Yongjian Tan Kai Ma Qingfang Wang Shengyong Pan Liufeng Tao 《Journal of Earth Science》 SCIE CAS CSCD 2023年第5期1406-1417,共12页
Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integrati... Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fundamental research task for building knowledge graphs, which needs to be supported by a high-quality corpus, and currently there is a lack of high-quality named entity recognition corpus in the field of geology, especially in Chinese. In this paper, based on the conceptual structure of geological ontology and the analysis of the characteristics of geological texts, a classification system of geological named entity types is designed with the guidance and participation of geological experts, a corresponding annotation specification is formulated, an annotation tool is developed, and the first named entity recognition corpus for the geological domain is annotated based on real geological reports. The total number of words annotated was 698 512 and the number of entities was 23 345. The paper also explores the feasibility of a model pre-annotation strategy and presents a statistical analysis of the distribution of technical and term categories across genres and the consistency of corpus annotation. Based on this corpus, a Lite Bidirectional Encoder Representations from Transformers(ALBERT)-Bi-directional Long Short-Term Memory(BiLSTM)-Conditional Random Fields(CRF) and ALBERT-BiLSTM models are selected for experiments, and the results show that the F1-scores of the recognition performance of the two models reach 0.75 and 0.65 respectively, providing a corpus basis and technical support for information extraction in the field of geology. 展开更多
关键词 ontology geological reports named entity recognition geological corpus construction semi-automated annotation platforms deep learning
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部