摘要
为了增加对不同类别样例的区分度,提高模型的分类效果,提出了结合类别关键词和注意力机制的药物相互关系(DDI)抽取模型KA-BERT。首先基于卡方检验和文档频率获取每个类别的关键词,然后在预训练BERT模型中加入关键词与药物对的位置编码,以增加样例的差异性,并通过注意力机制学习关键词与句子中其他词的分布信息。针对药物关系抽取任务中负样例较多的问题,文中提出了基于规则和模式的负样例过滤方法,以有效降低正负样本比例。与现有基于CNN、基于LSTM和基于BERT的DDI提取模型实验结果的对比表明,KA-BERT模型能够很好地提高药物关系的提取效果,证明了该模型的有效性。在化学-蛋白质相互关系抽取上的测试结果表明,KA-BERT模型的准确率、召回率和F 1值均有明显的提升,证明了该模型的有效性和通用性。
A drug interaction extraction model combining category key words with attention mechanism was proposed to enhance the discrimination among different categories of data and improve the performance of classifier.Firstly,the keywords of each class were selected based on the chi-square test and document frequency.Then,the position coding of keywords and drug pairs was added into the pre-trained model BERT,in order to make the diffe-rence of the samples more salient.The distribution information of keywords and other words in the sentence was learned through the attention mechanism to improve the performance of the model.Aiming at the problem of too much negative samples in the drug interaction extraction experiment,a negative sample filtering method based on rules and patterns was proposed to effectively reduce the proportion of positive and negative samples.Compared with other DDI models based on CNN,LSTM,and BERT,KA-BERT model can better improve performance on DDI data,which proves the effectiveness of KA-BERT model.The results of the test on chemical protein relation extraction show that the precision,recall and F1 score of KA-BERT model are enhanced significantly,which further proves the validity and universality of KA-BERT model.
作者
IKA Novita Dewi
蔡晓玲
刘晓锋
董守斌
IKA Novita Dewi;CAI Xiaoling;LIU Xiaofeng;DONG Shoubin(School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China)
出处
《华南理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2021年第1期10-17,共8页
Journal of South China University of Technology(Natural Science Edition)
基金
国家自然科学基金资助项目(61976239)
中山市引进高端科研机构创新专项资金资助项目(2019AG031)。