摘要
自然语言处理中,实体与关系抽取是构建知识图谱、设计问答系统、语义分析等任务中不可或缺的环节。中医领域的信息多数以非结构化文本形式储存,中医文本关键信息抽取对挖掘名老中医的经验有重要作用。然而,中医文本往往存在样本不均衡、实体关系多词一义的问题,如多种诊断结果指向同一证候。为解决这些问题,构建半监督学习框架下基于SimBERT的关系抽取模型对中医文本的实体关系进行抽取,利用SimBERT的相似文本生成功能进行文本增强,以解决样本不均衡问题,SimBERT的相似句检索功能较好地解决了多词一义的问题。实验结果证明,半监督学习框架下的SimBERT模型在构建的中医医案数据集上能更精确地抽取中医文本中的实体关系。
In natural language processing,entity and relation extraction is an indispensable part of knowledge graph construction,question answering system design,semantic analysis and other tasks. Most of the information in the field of TCM is stored in the form of unstructured texts. The extraction of key information in TCM texts plays an important role in mining the experience of famous TCM practitioners. However,traditional Chinese medicine texts often have the problems of imbalanced samples and multiple words and one meaning in entity relationship,such as multiple diagnosis results pointing to the same syndrome. To solve these problems,constructed a relationship extraction model based on SimBERT under the semi-supervised learning framework to extract entity relations of traditional Chinese medicine texts. The similar text generation function of SimBERT is used to enhance the text to solve the problem of unbalanced samples. The similar sentence retrieval function of SimBERT solves the problem of multiple words with one meaning. The experimental results show that the SimBERT model based on semi-supervised learning framework can extract entity relations from TCM texts more accurately on the TCM medical case data set constructed in this paper.
作者
刘逍
龚庆悦
李铁军
王红云
LIU Xiao;GONG Qing-yue;LI Tie-jun;WANG Hong-yun(College of Artificial Intelligence and Information Technology,Nanjing University of Chinese Medicine,Nanjing 210046,China;The Second Affiliated Hospital of Nanjing University of Chinese Medicine(Jiangsu Second Hospital of Traditional Chinese Medicine),Nanjing 210017,China)
出处
《软件导刊》
2022年第11期12-18,共7页
Software Guide
关键词
关系抽取
SimBERT
中医医案
relational extraction
SimBERT
cases of traditional Chinese medicine