摘要
【目的】改进PubMedBERT在化学诱导性疾病(CID)实体关系分类的效果。【方法】提出一种基于PubMedBERT并结合Text-CNN的实体关系分类方法。该方法以实体对和文本组成句子对进行输入,利用PubMedBERT预训练模型对化学诱导性疾病相关文本进行编码获取全局特征,通过Text-CNN捕捉文本局部重要信息,判断实体对是否具有CID关系。【结果】在BioCreative V CDR数据集中,该方法的精确率、召回率和F1值分别达到78.3%、73.5%和75.8%,较其他方法最少提升了3.1%、1.5%和3.3%。【局限】仅考虑了化学诱导性疾病文本语料,在临床等其他语料上的效果有待检验。【结论】该方法能够捕捉化学诱导性疾病文本特征,提升实体关系分类的效果。
[Objective] This paper tries to improve the performance of PubMedBERT for CID entity relation classification. [Methods] We proposed a classification model based on PubMedBERT, which was also fine-tuned by Text-CNN. Then, we input entity pairs and sentence pairs to the model. Third, we used PubMedBERT to encode CID texts and obtained their global features. Finally, we captured important local information from the global features with Text-CNN to decide whether entity pairs have CID relation. [Results] The precision, recall and F1 value of this method on the BioCreative V CDR dataset reached 78.3%, 73.5% and 75.8% respectively,which were at least 3.1%, 1.5% and 3.3% higher than other methods. [Limitations] This model only examines CID texts, and more research is needed to evaluate its performance on clinical data or corpus of other domains.[Conclusions] This method can capture the features of CID texts and improve their entity relation classification.
作者
董淼
苏中琪
周晓北
兰雪
崔志刚
崔雷
Dong Miao;Su Zhongqi;Zhou Xiaobei;Lan Xue;Cui Zhigang;Cui Lei(Financial Section,China Medical University,Shenyang 110122,China;China Medical University Library,Shenyang 110122,China;Institute of Health Sciences,China Medical University,Shenyang 110122,China;School of Health Management,China Medical University,Shenyang 110122,China;Nursing School,China Medical University,Shenyang 110122,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2021年第11期145-152,共8页
Data Analysis and Knowledge Discovery