期刊文献+

基于GPTs的中医知识图谱实体和关系抽取研究

Research on entity and relation extraction from traditional Chinese medicine knowledge graphs based on GPTs
下载PDF
导出
摘要 目的研究基于自定义生成型预训练变换模型(GPTs)抽取中医古籍文献中的实体与关系并构建知识图谱的方法,探索其在中医古籍文献整理与挖掘中的潜在优势。方法以《中华医方》部分数据为例,首先通过比较测试国内外常用的GPTs产品处理中医古籍文献数据方面的能力,筛选合适的大语言模型提取《中华医方》中的实体与关系,并将结果以实体表和关系表的形式输出;然后将实体表和关系表导入知识图谱构建工具neo4j,生成中医知识图谱;之后,比较深度学习模型CasRel、GPLinker与GPTs在中医知识图谱构建中的效果,并进行评价。结果实验结果显示,在识别和提取《中华医方》中的实体和关系方面,预训练生成聊天模型4.0(ChatGPT4.0)的GPTs优于其他大语言模型;与传统的深度学习模型比较,其在准确率(P)、召回率(R)、调和平均数(F1)三个指标方面都比其他深度学习模型的表现更好。结论利用GPTs进行中医古籍的实体识别和关系抽取表现出较好的效果,可减少传统方法中人工标注语料和模型训练的需求,有望在中医药领域得到快速推广。然而,此方法也存在一定的局限性,如提示词的编写复杂度高、输出结果可能存在信息省略,以及处理较长文本时上下文窗口长度不足等问题。未来研究需要进一步优化大语言模型的使用策略,以提高其在中医知识图谱构建中的效率和准确性。 Objective To study the methodology of using generative pre-trained transformer models(GPTs)to extract entities and relations from ancient traditional Chinese medicine(TCM)literature and construct knowledge graphs,and explore their potential advantages in organizing and mining TCM literature.Methods Some selected data from Zhonghua Yifang were used in this study.Firstly,we compared the performance of domestic and international GPTs products in processing TCM ancient texts.The suitable large language models were then used to extract entities and relations from Zhonghua Yifang,with results outputted in the form of entity and relation tables.These tables were subsequently imported into the neo4j knowledge graph construction tool to generate a TCM knowledge graph.We further compared and evaluated the performance of deep learning models CasRel,GPLinker,and GPTs in constructing TCM knowledge graphs.Results Experimental results showed that the ChatGPT 4.0-based GPTs model outperformed other large language models in identifying and extracting entities and relations from Zhonghua Yifang.The GPTs showed better performance in precision(P),recall(R),and harmonic mean(F1)than other traditional deep learning models.Conclusions Using GPTs for entity recognition and relation extraction in TCM ancient texts shows promising results,reducing the need for manual annotation of corpora and model training required in traditional method.Therefore,it is expected to be rapidly promoted in the field of TCM.However,there are also some limitations,such as the complexity in crafting prompts,potential omissions in the output,and inadequate context window length for longer texts.Further research should focus on optimizing the use of large language models to enhance efficiency and accuracy in constructing TCM knowledge graphs.
作者 何宇浩 李明 罗晓兰 刘丽莉 杨琦 朱邦贤 吕宇涵 HE Yuhao;LI Ming;LUO Xiaolan;LIU Lili;YANG Qi;ZHU Bangxian;LYU Yuhan(Shanghai University of Traditional Chinese Medicine,Shanghai 201203,China;Pepperdine University,Malibu,CA 90263,USA)
出处 《上海中医药杂志》 CSCD 2024年第8期1-6,共6页 Shanghai Journal of Traditional Chinese Medicine
基金 国家社会科学基金重大项目(19ZDA301) 江苏省南京市医疗保障局项目(JSDY-2024F015)。
关键词 人工智能 大语言模型 自定义生成型预训练变换模型 中医 知识图谱 古籍文献 artificial intelligence large language models generative pre-trained transformer models traditional Chinese medicine knowledge graph ancient literature
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部