摘要
医疗知识图谱问答结合医学知识和自然语言处理技术,为医疗从业者和患者提供准确、快速的问答服务。随着数据激增,现有的中文医疗知识图谱不够全面,并且医学问题复杂多义,准确识别实体信息、生成通俗易懂的回答仍有挑战。提出了一种基于混合动态掩码与多策略融合的医疗知识图谱问答框架。通过整合公开数据集与医药平台的疾病知识,构建了一个包含34167个实体和297463条关系的医疗知识图谱,涵盖疾病、药品、食物等多个类别。提出BERT-MaskAttention-BiLSTM-CRF混合动态掩码模型来精确识别输入的医疗实体信息,更有效地关注重要内容,去除冗余信息干扰。采用实体对齐策略将医疗实体进行统一和标准化,通过意图识别策略深入理解用户的查询意图,结合大型语言模型对知识图谱的输出进行润色,保证回答内容更加容易理解。实验结果表明,在实体识别对比实验中模型的宏观平均F1值达到0.9602,在问答测试实验中,平均准确率达到0.9656,且生成的内容更加通俗易懂,可解释性强。
Medical knowledge graph question-answering combines medical knowledge and natural language pro-cessing technology to provide accurate and fast question-answering services for medical practitioners and patients.However,the current Chinese medical knowledge graphs are not comprehensive enough due to the surge in data.Additionally,the complex and ambiguous nature of medical questions poses a significant challenge in accurately identifying entity information and generating answers that are both easily comprehensible and accessible to the public.This paper proposes a medical knowledge graph question-answering framework based on hybrid dynamic masking and multi-strategy fusion.Initially,a medical knowledge graph encompassing 34167 entities and 297463 relationships is constructed by integrating public datasets and disease knowledge from medical platforms,covering categories such as diseases,medications,and food.Subsequently,a BERT-MaskAttention-BiLSTM-CRF hybrid dynamic masking model is introduced to accurately identify medical entity information in the input,effectively focusing on essential content and eliminating interference from redundant information.Finally,entity alignment strategies are employed to unify and standardize medical entities,while intent recognition strategies delve into users’query intentions.This is coupled with the use of large language models to refine the output from the knowledge graph,ensuring that the responses are more readily comprehensible.Experimental results demonstrate that the model achieves a macro-average F1 score of 0.9602 in entity recognition comparative experiments and an average accuracy of 0.9656 in question-answering tests.The generated content is more easily comprehensible and interpretable.
作者
王润周
张新生
WANG Runzhou;ZHANG Xinsheng(School of Management,Xi'an University of Architecture and Technology,Xi'an 710055,China)
出处
《计算机科学与探索》
CSCD
北大核心
2024年第10期2770-2786,共17页
Journal of Frontiers of Computer Science and Technology
基金
陕西省重点产业创新链(群)-工业领域项目(2022ZDLGY06-04)
陕西省社科界重大理论与现实问题研究联合项目(2022HZ1522)。
关键词
混合动态掩码
多策略融合
知识图谱
医疗问答
大语言模型
hybrid dynamic masking
multi-strategy fusion
knowledge graph
medical question-answering
large language model